Need to create pipeline to execute subfolder file

Samriddhi Gupta 20 Reputation points
2024-09-02T13:06:46.98+00:00

Hi Team,

I wanted to create a pipeline with XML dataset , ADLS gen2 storage is in hierarchy as -

Rootfolder>subfolder>IN>OUT>STOCK>2024>07>31 --> Inside 31 folder i have XML file likewise

in monthly folder i have datewise folder Rootfolder>subfolder>IN>OUT>STOCK>2024>07>30 ..so on

How to build a pipeline to pick up file from subfolder. Is there any way to do it?

Thanks in advance,

Samriddhi Gupta

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,873 questions
{count} votes

Accepted answer
  1. Amira Bedhiafi 26,656 Reputation points
    2024-09-02T13:21:09.6933333+00:00

    I am breaking down your use case into these steps :

    1. Create a Linked Service

    • First, create a linked service in ADF to connect to your ADLS Gen2 storage account.
      • Go to the Manage tab in ADF.
      • Select Linked services and click New.
      • Choose Azure Data Lake Storage Gen2 and configure the linked service with your storage account credentials.

    2. Create a Dataset

    • Create a dataset that points to the folder structure in your ADLS Gen2.
      • Go to the Author tab, select Datasets, and click New dataset.
      • Choose Azure Data Lake Storage Gen2 as your data store and then select XML as the file format.
      • In the Connection tab, choose the linked service you created earlier.
      • Specify the Rootfolder (or a broader folder) in the file path. Do not include subfolders in this step.
      • In the File path type, select Wildcard folder path.

    3. Configure the Dataset to Use Wildcards

    • Use the wildcard characters to dynamically reference the subfolders:
      • Set the File path in the dataset like this: @{dataset().FolderPath}/IN/OUT/STOCK/*/*/*/*.xml.
    • This path assumes your XML files are named *.xml and are located at the end of a folder structure following the pattern STOCK/YYYY/MM/DD.

    4. Create the Pipeline

    • In the Author tab, create a new pipeline.
    • Add a Get Metadata activity to retrieve the list of subfolders (dates) within the monthly folder.
      • Point this activity to your dataset.
      • In the Field list, select Child items to retrieve a list of folders (e.g., 31 in your example).
    • Add a ForEach activity to iterate over each subfolder.
      • Inside the ForEach activity, add a Copy Data activity to copy the XML files from the subfolder to your destination.
      • In the Source settings of the Copy Data activity, set the File path dynamically using the folder structure retrieved from the Get Metadata activity.

    5. Configure the Pipeline

    • Make sure to parameterize your dataset to allow dynamic paths based on the subfolder structure.
    • Use expressions to dynamically generate the file paths in the Copy Data activity.
      • For example, the file path could be: @concat('Rootfolder/subfolder/IN/OUT/STOCK/', item().Year, '/', item().Month, '/', item().Day, '/*.xml').

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.