For Each Activity to process CSV Files

DataCoder 220 Reputation points
2024-09-09T09:32:51.9033333+00:00

I need to design a pipeline in Azure Data Factory to process multiple CSV files that are stored in an Azure Blob Storage container. Each file contains transaction data, and I need to perform the following tasks for each file:

Load the data from each CSV file.

Apply certain transformations to the data, such as filtering and aggregating.

Save the transformed data into an Azure SQL Database.

Could you provide guidance on how to configure the 'ForEach' activity?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,873 questions
0 comments No comments
{count} votes

Accepted answer
  1. NIKHILA NETHIKUNTA 3,270 Reputation points Microsoft Vendor
    2024-09-09T10:10:53.8866667+00:00

    @DataCoder
    Thank you for the question and for using Microsoft Q&A platform.

    To process multiple CSV files stored in an Azure Blob Storage container and perform the tasks you mentioned, you can use the ForEach activity in Azure Data Factory. The ForEach activity allows you to iterate over a collection and execute a set of activities for each item in the collection.

    Here are the steps to configure the ForEach activity:

    1. Create a pipeline in Azure Data Factory and add a Get Metadata activity to it. In the Get Metadata activity, set the folder path in the dataset to the Azure Blob Storage container where the CSV files are stored, and select the "Child Items" option to get the metadata of all the files in the container.

    User's image

    1. Add a ForEach activity to the pipeline and connect it to the Get Metadata activity. In the ForEach activity, set the "Items" property to the output of the Get Metadata activity, which is an array of file paths.

    User's image

    1. Inside the ForEach activity, add a Copy activity to load the data from each CSV file. Set the source dataset to the same dataset used in the Get Metadata activity and set the sink dataset to a destination table in the Azure SQL Database where you want to store the data.

    User's image

    Run the pipeline to process all the CSV files in the Azure Blob Storage container.

    To configure the ForEach activity, you need to specify the "Items" property to iterate over the collection. In this case, the collection is an array of file paths obtained from the Get Metadata activity. You can use the following expression to specify the "Items" property:

    @activity('Get Metadata1').output.childItems

    This expression returns an array of file paths, which is used by the ForEach activity to iterate over the CSV files.

    You can refer this document for more information:
    https://learn.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.