Share via

How to read files from a folder on sharepoint online using ADF

Haider, Mustafa (Reigate) 25 Reputation points
Oct 20, 2023, 3:31 PM

I have got some files i.e. csv and excel files on sharepoint online in different folders. I have created a link service using service principle and test the connection, connection is working fine.

Now I have created a dataset using linked service i have setup for sharepoint. I can see the folders in list name drop down box.

Now dataset has been created.

From that dataset e.g. selected Folder name is Europe. In this folder i have got files 2023_01_01_Europe are there by week. Date is start of the week. I need to pick up those files and load into SQL Database.

How i will access these files from dataset which is pointing to the folder, it has got subfolders as well. would i create dataset or there is other way of doing it?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,331 questions
SharePoint
SharePoint
A group of Microsoft Products and technologies used for sharing and managing content, knowledge, and applications.
11,230 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Smaran Thoomu 21,320 Reputation points Microsoft External Staff
    Oct 21, 2023, 4:07 PM

    @Haider, Mustafa (Reigate) Welcome to the Microsoft Q&A platform and thank you for posting your question here.
    As I understand you want to read files from folder on SharePoint Online using ADF. If this is not, please let me know.

    To access files from subfolders in SharePoint Online using Azure Data Factory, you can create a dataset that points to the parent folder and then use a wildcard path to specify the subfolder and file name pattern.

    Adding to that Azure Data Factory (ADF) cannot directly copy a folder or multiple files from SharePoint Online, but there are ways to do it. You need to add two steps to the pipeline compared to copying a single file:

    1. Get the list of files. You can do this manually by maintaining a list of file names in a text file, or you can use a Web Activity to call the SharePoint REST API to get the list of files.
    2. Use a ForEach Activity to loop through the list of file names and pass each file name to a Copy Activity. The base URL for the Copy Activity is slightly different than for copying a single file.
      User's image
    3. Inside ForEach use Copy Data activity to read the files from the folder you selected in the dataset. You can use the wildcard characters to specify the file names or patterns. If you want to read all the CSV files in the Europe folder, you can use the following file path: Europe/2023_01_01_*.
      User's image

    Here's an example of how to set up the Copy Data activity to retrieve files from the Europe folder and transfer the data to the SQL Database.

    {
        "name": "CopyData",
        "type": "Copy",
        "inputs": [
            {
                "name": "SharePointDataset"
            }
        ],
        "outputs": [
            {
                "name": "SqlDatabaseDataset"
            }
        ],
        "copyBehavior": "PreserveHierarchy",
        "source": {
            "type": "SharePointOnlineSource",
            "queryTimeout": "02:00:00",
            "recursive": true,
            "wildcardFileName": "*.csv",
            "folderPath": "Europe"
        },
        "sink": {
            "type": "AzureSqlDatabaseSink",
            "writeBatchSize": 10000,
            "writeBatchTimeout": "00:05:00",
            "sqlWriterCleanupScript": "",
            "allowPolyBase": true,
            "polyBaseSettings": {
                "rejectType": "Percentage",
                "rejectValue": 10,
                "useTypeDefault": true
            }
        },
        "enableStaging": false,
        "translator": {
            "type": "TabularTranslator",
            "columnMappings": "AutoMap"
        },
        "parallelCopies": 1,
        "cloudDataMovementUnits": 0
    }
    
    
    

    Furthermore, the 'Copy file from SharePoint Online' operation uses Azure Active Directory (AAD) and service principal authentication, along with the SharePoint API, to fetch files.

    Register SharePoint Application and Grant permission - https://docs.microsoft.com/en-us/azure/storage/common/storage-auth-aad-app?tabs=dotnet#register-your...
    You can find complete instructions on how to register an app and grant permissions in the prerequisites section here.
    Please refer to this article for a better understanding Copy files from SharePoint

    I hope this helps! Let me know if you have any further questions.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

  2. Barry Evanz 235 Reputation points
    Oct 23, 2023, 8:11 PM

    Firstly, I would recommend to set up your SharePoint Online linked service. Begin by configuring a linked service that will establish a connection between your Azure Data Factory and SharePoint Online. In your ADF workspace, navigate to the "Manage" tab and create a new linked service. Opt for "SharePoint Online" as your chosen service, and supply the URL of your SharePoint site. Employ Service Principal for authentication, and furnish the Application ID and Application Key for your service principal. Verifying a successful connection is crucial at this stage before saving the linked service configuration.

    Next, create a dataset within your Azure Data Factory that points directly to the SharePoint folder where your target files reside. This dataset acts as your source for data extraction. Within the "Datasets" section, initiate a new dataset and select "SharePoint Online" as the dataset type. Associate this dataset with the linked service you previously configured. In the "List name" field, designate the SharePoint folder containing the files you intend to extract. Save the dataset configuration to proceed.

    Finally, to orchestrate the data movement, build a data pipeline encompassing a copy activity. Within your ADF workspace, access the "Pipelines" tab, and craft a new pipeline. Assign a name to your pipeline and add a copy activity to it. Configure the copy activity properties, designating SharePoint Online as the source, utilizing the dataset configured earlier. Define the destination as your SQL Database, using the corresponding dataset.

    Here's a snippet of JSON illustrating the Copy Data activity configuration:

    {
        "name": "CopyDataFromSharePointToSQL",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "SharePointDataset",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "SQLDataset",
                "type": "DatasetReference"
            }
        },
        "activities": [
            {
                "name": "CopyData",
                "type": "CopyData",
                "inputs": [
                    {
                        "referenceName": "SharePointDataset"
                    }
                ],
                "outputs": [
                    {
                        "referenceName": "SQLDataset"
                    }
                ],
                "typeProperties": {
                    "source": {
                        "type": "SharePointSource",
                        "recursive": true,
                        "wildCardFileName": {
                            "value": "2023_*_Europe",
                            "type": "Expression"
                        }
                    },
                    "sink": {
                        "type": "SqlSink"
                    },
                    "enableStaging": false
                }
            }
        }
    }
    

    With your pipeline configured, execute it to initiate the seamless transfer of data. This structured approach ensures a secure, efficient, and fully automated process of reading and transferring files from SharePoint Online to your SQL Database. Be sure to adjust the wildcard pattern and configuration parameters to align with your specific folder structure and file naming conventions.


  3. ChengFeng - MSFT 5,045 Reputation points Microsoft External Staff
    Oct 25, 2023, 2:20 AM

    Hi @Haider, Mustafa (Reigate)

    Because this method uses sharepoint rest api.

    Regarding obtaining file content, permissions need to be configured.

    Add a permission

    User's image

    Select sharepointUser's image

    Assign permissions, the minimum standard is read

    User's image

    click Grant admin consent for ....

    User's image


    If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    Best Regards

    Cheng Feng


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.