dataflow foreach activity with multiple file

akeem sanni 20 Reputation points
2023-12-03T13:35:24.9766667+00:00

1-
In the pipeline, I use a metadata and foreach loop activity to read the list of pipe delimited (tables)files to load from blob storage.
2-

In my my foreach activity i have a data flow activity that reads each files.

3-
For each of the table, I want to perform some data conversions (like add new column, change column name, add expressions to change column values) using derived column activity in my data flow activity.

How can I achieve this?

Thank you

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,153 questions
0 comments No comments
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 32,011 Reputation points Microsoft Employee
    2023-12-15T07:55:37.44+00:00

    Hi akeem sanni ,

    Performing data transformations using mapping Dataflow is completely a schema-driven process. So in case you do not have fixed schema for multiple files and you want to have a single pipeline for multiple files having different schema, then it is not possible.

    You need to create different dataflows for multiple files having different schema since importing schema or projection is required if you are modifying existing column values etc. If only adding new column would be the requirement then without importing schema also, it would have been possible.

    Either you need to create multiple dataflows, or you can write your custom code using c#, python and run it using custom batch activity in adf.

    Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou


2 additional answers

Sort by: Most helpful
  1. Suba Balaji 11,206 Reputation points
    2023-12-03T15:08:15.9033333+00:00

    Hi, if your files in BLOB are of same schema, you don't have to use a get meta data activity plus a foreach. Data flow is able to load all files in a folder without loop.

    Please go through this video for steps. https://youtu.be/Pj5mrtTsR1M?si=SmAuIGpHeB7PlLW4

    Once you set up source pouting to your source folder, you can then add a derived column, and perform actions like add new column, rename a column, add expression .

    Hope it helps. If you need further support please let us know. Will be happy to help.

    Thanks


  2. AnnuKumari-MSFT 32,011 Reputation points Microsoft Employee
    2023-12-14T08:19:03.7866667+00:00

    Hi akeem sanni ,

    Welcome to Microsoft Q&A platform and thanks for posting your question here.

    As per my understanding you want to perform certain data transformations and then load the data.

    As mentioned above by community expert, in case schema is same for all files/tables, you can use single dataset for all the files present in the folder.

    However , if schema is different , then the approach you have described in the question seems correct . Are you facing any issue while implementing the same? If yes, kindly share more insights on the issue.

    Sharing detailed information regarding the different components to be used in the pipeline:

    Get Metadata activity

    ForEach activity

    Derived column transformation in mapping data flow

    Hope it helps. Kindly let us know if you need further assistance. Please accept the answer by clicking on Accept answer button. Thankyou