looping in ADF

DataCoder 280 Reputation points
2024-08-22T16:31:59.6466667+00:00

I am building a pipeline in ADF that needs to process multiple files stored in an Azure Data Lake Storage Gen2 folder. These files are either CSV or Parquet format, and each file needs to be processed one after the other (sequentially) through different activities in the pipeline.
How can I configure the pipeline to loop through each file in the folder sequentially?

I need to ensure that each file is processed one after another rather than in parallel. How can I enforce this sequential execution in the ForEach activity?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
0 comments No comments
{count} votes

Answer accepted by question author
  1. NIKHILA NETHIKUNTA 4,610 Reputation points Microsoft External Staff
    2024-08-26T09:33:49.75+00:00

    Hi @DataCoder
    You can follow the below steps:

    Get Metadata Activity:

    • First, use the Get Metadata activity to retrieve the list of files in your Azure Data Lake Storage Gen2 folder. This activity will provide the file names that you need to process. User's image

    ForEach Activity:

    • Add a ForEach activity to your pipeline. This activity will loop through each file retrieved by the Get Metadata activity. User's image
    • Add Activities Inside ForEach: Inside the ForEach activity, add the activities that you need to perform on each file (e.g., Copy Data, Data Flow, etc.). These activities will be executed sequentially for each file.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Amira Bedhiafi 41,111 Reputation points Volunteer Moderator
    2024-08-22T20:17:46.23+00:00

    You can use the ForEach activity in combination with the Get Metadata and Execute Pipeline activities :

    Step 1: Use Get Metadata Activity to List Files

    1. Add a Get Metadata Activity:
      • This activity will be used to list the files in your target folder.
      • Set the Dataset property to point to the folder in your Data Lake Storage Gen2 where the files are stored.
      • In the Field List of the activity, select Child Items. This will return a list of all files and subfolders in the directory.
    2. Store the Output:
      • Ensure that the output of the Get Metadata activity is stored in a variable, typically as an array of file names.

    Step 2: Configure the ForEach Activity

    1. Add a ForEach Activity:
      • The ForEach activity will loop through each file name obtained from the Get Metadata activity.
      • In the Items field, reference the array of file names from the Get Metadata activity (e.g., @activity('Get Metadata').output.childItems).
    2. Set Sequential Execution:
      • In the settings of the ForEach activity, ensure that the Batch Count is set to 1. This setting ensures that files are processed one at a time, sequentially.

    Step 3: Add Activities Inside the ForEach Activity

    1. Add Inner Activities:
      • Inside the ForEach activity, you can add other activities that will process each file. Typically, this could be a Copy Data activity or a custom Execute Pipeline activity to handle complex processing.
    2. Parameterize the File Path:
      • To process each file, you can parameterize the file path based on the current item in the ForEach loop. For example, you can use @item().name to get the file name.

    If you need further assistance, feel free to ask!


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.