Copying Complete SharePoint Library to Azure Data Lake Storage in Azure Data Factory

A Aathithya 25 Reputation points
2024-03-01T11:50:49.09+00:00

How can I copy the entire SharePoint library, including its folders, subfolders (nested folders), and all files, to Azure Data Lake Storage (ADLS) in Azure Data Factory? Is this possible?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,348 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,599 questions
SharePoint
SharePoint
A group of Microsoft Products and technologies used for sharing and managing content, knowledge, and applications.
9,679 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 15,676 Reputation points
    2024-03-01T14:17:32.69+00:00

    You can copy file from SharePoint Online by using Web activity to authenticate and grab access token from SPO, then passing to subsequent Copy activity to copy data with HTTP connector as source.

    https://learn.microsoft.com/en-us/answers/questions/53586/copy-files-from-sharepoint-into-azure-data-lake-st

    Step by step using ADF :

    If you haven't already, you need to create an Azure Data Factory instance in your Azure subscription. This can be done through the Azure Portal.

    You need to create linked services in ADF for both SharePoint Online and Azure Data Lake Storage Gen2 :

    • For SharePoint Online: You'll need to create a linked service connecting to your SharePoint Online site. You would typically use the Office 365 authentication method, requiring your SharePoint URL, Authentication method (Office 365, Windows, or Anonymous), and your credentials
    • For Azure Data Lake Storage Gen2: Create a linked service for ADLS Gen2 using your storage account name, URL, and authentication method (such as account key, service principal, or managed identity)

    After setting up the linked services, you need to create datasets that reference the linked services.

    • For SharePoint: Create a dataset for the SharePoint folder you wish to copy. You need to specify the site URL and the folder path within your SharePoint site
    • For ADLS Gen2: Create a dataset for your ADLS Gen2 filesystem where you want to copy the SharePoint files and folders

    Next, you need to create a pipeline that defines the data movement and transformation activities.

    • Copy Activity: Add a Copy activity to your pipeline. This activity will be responsible for copying data from your source (SharePoint) to your destination (ADLS Gen2) In the source settings of the Copy activity, select your SharePoint dataset In the sink settings, select your ADLS Gen2 dataset Configure the Copy activity to recursively copy files if you want to include subfolders and nested files
    0 comments No comments