How can I efficiently download files from various subfolders and nested folders within different levels of hierarchy in SharePoint, and then transfer them to Azure Data Lake Storage (ADLS) using Azure Data Factory (ADF)?

Chebolu Sai Manasa 120 Reputation points
2024-04-04T06:47:14.5+00:00

How can I efficiently download files from various subfolders and nested folders within different levels of hierarchy in SharePoint, and then transfer them to Azure Data Lake Storage (ADLS) using Azure Data Factory (ADF)?

Despite attempting various approaches, I have encountered failures due to the complexity of handling approximately 50,000 files distributed across different levels of hierarchy. Could you recommend a custom solution, perhaps utilizing a Python script or an alternative method?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,389 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,847 questions
SharePoint
SharePoint
A group of Microsoft Products and technologies used for sharing and managing content, knowledge, and applications.
9,966 questions
0 comments No comments
{count} votes

Accepted answer
  1. Nehruji R 3,346 Reputation points Microsoft Vendor
    2024-04-05T08:04:28.85+00:00

    Hello

    Greetings! Welcome to Microsoft Q&A Platform.

    I understand that you would like to download files with nested folders from Sharepoint and transfer them to ADLS using Azure Data Factory, you can simply right click the folders needed and download it. Otherwise you can directly transfer the files by copying it - please refer to below MS doc and see if that helps to achieve your requirement. In case if you face any issues or if you have any feedback/suggestions regarding this implementation, please do share it with us so that I can take it forward to appropriate team.

    Here is the MS doc: Copy file from SharePoint Online using Azure Data Factory

    You can copy file from SharePoint Online by using Web activity to authenticate and grab access token from SPO, then passing to subsequent Copy activity to copy data with HTTP connector as source.

    21642-image.png

    1. Follow the Prerequisites section to create AAD application and grant permission to SharePoint Online.
    2. Create a Web Activity to get the access token from SharePoint Online: URL: https://accounts.accesscontrol.windows.net/[Tenant-ID]/tokens/OAuth/2. Replace the tenant ID. Method: POST Headers: Content-Type: application/x-www-form-urlencoded Body: grant_type=client_credentials&client_id=[Client-ID]@[Tenant-ID]&client_secret=[Client-Secret]&resource=00000003-0000-0ff1-ce00-000000000000/[Tenant-Name].sharepoint.com@[Tenant-ID]. Replace the client ID, client secret, tenant ID and tenant name. Note: Set the Secure Output option to true in Web activity to prevent the token value from being logged in plain text. Any further activities that consume this value should have their Secure Input option set to true.
    3. Chain with a Copy activity with HTTP connector as source to copy SharePoint Online file content: HTTP linked service: i) Base URL: https://[site-url]/_api/web/GetFileByServerRelativeUrl('[relative-path-to-file]')/$value. Replace the site URL and relative path to file. Sample relative path to file as /sites/site2/Shared Documents/TestBook.xlsx. ii) Authentication type: Anonymous (to use the Bearer token configured in copy activity source later) Dataset: choose the format you want. To copy file as-is, select "Binary" type. Copy activity source: i) Request method: GET ii) Additional header: use the following expression@{concat('Authorization: Bearer ', activity('<Web-activity-name>').output.access_token)}, which uses the Bearer token generated by the upstream Web activity as authorization header. Replace the Web activity name. Configure the copy activity sink as usual.

    refer Similar threads-https://learn.microsoft.com/en-us/answers/questions/1605252/copying-complete-sharepoint-library-to-azure-data, https://learn.microsoft.com/en-us/answers/questions/1513111/how-to-copy-folders-and-files-from-a-sharepoint-si.

    Hope this helps. Please let us know if you have any further queries. I’m happy to assist you further.


    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

0 additional answers

Sort by: Most helpful