How to sync Azure data lake storage with sharepoint drive?

Rayne 20 Reputation points
2024-05-12T15:00:05.1166667+00:00

We have copied files from sharepoint site multiple drives having nested folders in Azure data lake storage container maintaining same folder structure. Now I want to create a pipeline in Azure data factory to delete file from ADLS container which is not present or got deleted from Sharepoint site. I tried to maintain logs of file in a json format for every document library but it was difficult to deal with nested folders. Please suggest me some solutions to implement this.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,371 questions
SharePoint
SharePoint
A group of Microsoft Products and technologies used for sharing and managing content, knowledge, and applications.
9,871 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Nehruji R 2,966 Reputation points Microsoft Vendor
    2024-05-13T05:41:16.0933333+00:00

    Hello Rayne,

    Greetings! Welcome to Microsoft Q&A Platform.

    I understand that you have copied the nested folders with files to Azure Data Lake storage successfully and wanted to know how to delete some folder in SharePoint using Azure Data factory from ADLS Container.

    Since the built-in Delete activity in Azure Data Factory (ADF) doesn’t directly support SharePoint Online, you can create a custom solution using the HTTP connector. Currently delete activity in ADF only supports the below data stores,

    237422-image.png

    Ref: Delete activity supported data sources

    As a workaround you may try exploring HTTP connector. OR you can use custom activity and write your own code to delete files from SharePoint.

    Note: In Blob storage you may not be able to delete an empty folder/directory whereas in ADLS Storage you can be able to perform the operation.

    To create an Azure Data Factory (ADF) pipeline that deletes files from ADLS when they are no longer present in SharePoint,

    • Use the “Get Metadata” activity in your ADF pipeline to fetch the list of files from SharePoint. Configure the activity to retrieve file paths and other relevant metadata (e.g., modified timestamp).Similarly, use another “Get Metadata” activity to fetch the list of files from ADLS and configure it to retrieve file paths and metadata.
    • In a subsequent activity (e.g., “ForEach” or “Lookup”), compare the two lists:
      For each file in the SharePoint list: Check if it exists in the ADLS list.
      If not, use the “Delete” activity to remove it from ADLS. This step ensures that only missing files are deleted.
    • Implement error handling to handle cases where a file cannot be deleted (e.g., due to permissions or other issues).
    • Set up a trigger (e.g., time-based or event-based) to run the pipeline periodically. This ensures that ADLS remains synchronized with SharePoint.

    Similar thread for reference - https://stackoverflow.com/questions/76284402/adf-delete-activity-not-deleting-folders#:~:text=To%20resolve%20this%20use%20ADLS%20linked%20service%2C%20and,name%20is%20MainFolder%2FSubfolderA%2F20230430%20This%20will%20delete%20respective%20folder.

    Hope this answer helps! Please let us know if you have any further queries. I’m happy to assist you further.


    Please "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.