How do I dynamically read multiple file names?

Question

How do I dynamically read multiple file names?

King Java 790

I am trying to read name of about 20+ csv files, and ingest data dynamically.

Below is diagram of how I ingest data from each csv file individually.

User's image

So, I have at least 20+ pipelines for each csv files.

Below is how each csv files' name look like:

User's image

If I have a small character for variation, I think using Parameters like below screenshot would work, but I need to insert long characters for each file.

Example (of short variation that I use for other case): User's image

What is ideal way to ingest about 20+ csv files that have similar pattern of file names, except part like locations (ex: California, Oregon, Arizona etc.) in the middle of file name?

Should I create a csv file that has a list of locations and somehow read it thru inside Parameter values?

King Java 790 Reputation points

2024-11-19T23:52:30.65+00:00

@phemanth Thank you so much for your help. I like your approach using Python.

Can you possibly illustrate the pipeline with some image files? I am not sure which Activities I need to use for each steps (like getting data from csv file of list) & run Python code.
phemanth 15,760 Reputation points Microsoft External Staff Moderator

2024-11-20T17:40:23.2066667+00:00
@King Java Pipeline Overview

Activity: Copy Data

Use this activity to read the CSV file containing the list of locations. You can set the source to your CSV file and the sink to a temporary storage (like Azure Blob Storage or a Data Lake).

Activity: ForEach

This activity will iterate over the list of locations obtained from the previous step. Inside this loop, you can define the activities to process each location.

Activity: Set Variable

Use this activity to construct the file name dynamically based on the current location. You can set a variable like fileName using an expression that formats the base file name with the location.

Activity: Copy Data

Inside the ForEach loop, use another Copy Data activity to read the CSV file corresponding to the constructed file name. Set the source to the dynamically generated file path.

Activity: Azure Function or Databricks Notebook

If you need to run additional processing with Python, you can use an Azure Function or a Databricks Notebook activity. Pass the data from the previous step as input to your Python code.

Concatenate DataFrames (Optional)

If you are using Databricks, you can concatenate the DataFrames within your Python code after reading all the CSV files.
King Java 790 Reputation points

2024-11-21T00:08:22.2133333+00:00

@phemanth I was able to finish the first part (where it copies data from a csv file to a blob storage), but I am lost on the next steps.

Bottom is screenshot that I have so far:

So, you mentioned about creating "Variable". And you also mentioned about using expression.

I am not sure how to add expression here (inside Variables).

Can you possibly send me screenshots of what you have inside "ForEach1" and "Copy data2"?

If you are good, I would like to have a meeting with you privately to go each step by step or if you have any good YouTube video (I doubt there will be at this details), that would be great.

Bottom is another screenshot of what I have, but I doubt this is correct.

Also, regards to using Azure Function with Python, that is just another option (not required), correct?
phemanth 15,760 Reputation points Microsoft External Staff Moderator

2024-11-21T18:10:23.94+00:00
@I am just asking java programming language

Go to the Variables tab in your pipeline.

Click on New to create a variable.

Give it a name (e.g., currentLocation) and set its type (e.g., String).

Using Expressions

To set the value of a variable using an expression:

In the Set Variable activity, select the variable you created.

In the Value field, you can use an expression. For example, if you want to set it to the current location in a loop, you might use:JSON
@item()
This assumes you’re iterating over a list of locations in a ForEach activity.

ForEach Activity

In your ForEach activity:

Set the Items property to the array of locations you want to iterate over.

Inside the ForEach, add activities like Set Variable to set the current location, and then add your Copy Data activity to copy data for that location.

Copy Data Activity

In the Copy Data activity:

Set the Source to point to your CSV file. You can use an expression to dynamically construct the file path using the variable you set earlier:JSON
@concat('path/to/your/csv/'

Set the Sink to your blob storage.
King Java 790 Reputation points

2024-11-21T23:58:55.7666667+00:00

@phemanth

Thank you for your help.

I am confused over where I should use "Set Variable" because on your first reply, I did not see 'Set Variable' on the images. I am not sure where "Set Variable" should be located.

I am guessing we are still doing "ForEach" and put "Copy data" inside, correct?

Can you possibly post some of images (especially where Variable, expression etc. are placed) so that I could replicate?

Bottom is what I have, but I am not sure I am on the right track or not.

Thank you!
phemanth 15,760 Reputation points Microsoft External Staff Moderator

2024-11-22T16:14:58.7333333+00:00
@King Java

ForEach Activity: This is where you will iterate over your list of locations.

Inside the ForEach:

Set Variable Activity: This should be the first activity inside the ForEach loop. It will set the current location for each iteration.

Copy Data Activity: This should come after the Set Variable activity. It will use the variable set in the previous step to construct the file path dynamically.

Step-by-Step Setup

ForEach Activity

Items: Set this to the output of your previous activity that contains the list of locations (e.g., @activity('Copy Data').output.value).

Inside ForEach

Set Variable:

Name: Create a variable (e.g., currentLocation).

Value: Use the expression @item() to get the current location.

Copy Data Activity:

Source: Use an expression to dynamically construct the file path:JSON @concat('path/to/your/csv/data_', variables('currentLocation'), '_2024.csv') Sink: Set this to your blob storage.
King Java 790 Reputation points

2024-11-22T21:41:50.6033333+00:00
@phemanth

Thank you so much for your help.

I built up to "Set Variable" and did debugging.

Bottom is screenshot of what I built so far:

When I tried with @activity('Copy data1').output.value, it creates this error:

So, I took out the 'value' part, and ran again.

Now, I got this error message:

These are two screenshots for "ForEach" and "Set variable1":

"ForEach"

"Set variable1"

Can you point out what I am missing / need to be corrected?

By the way, the first step (Copy data), just move csv file from original File Storage to a Blob storage.

I am still confused why we need "Copy data" activity at first (what is objective).

Here is requirement:

I have a csv file (let's say "location.csv") that list all 20+ locations (example: California, Arizona etc.) to match the csv file names.

I have a pool of csv files (20+) that have payroll data (each with different locations).

Pipeline will go thru that "location.csv", and if name of location matches to the name of payroll files, those matching payroll csv files would be ingested to Azure SQL.

If this thread is getting too long, I would like to have meeting with you privately to make this work. Thank you.

Accepted answer

1 additional answer

Your answer

King Java 790 Reputation points

2024-11-19T23:52:30.65+00:00

@phemanth Thank you so much for your help. I like your approach using Python.

Can you possibly illustrate the pipeline with some image files? I am not sure which Activities I need to use for each steps (like getting data from csv file of list) & run Python code.
phemanth 15,760 Reputation points Microsoft External Staff Moderator

2024-11-20T17:40:23.2066667+00:00

@King Java Pipeline Overview

Activity: Copy Data

Use this activity to read the CSV file containing the list of locations. You can set the source to your CSV file and the sink to a temporary storage (like Azure Blob Storage or a Data Lake).

Activity: ForEach

This activity will iterate over the list of locations obtained from the previous step. Inside this loop, you can define the activities to process each location.

Activity: Set Variable

Use this activity to construct the file name dynamically based on the current location. You can set a variable like fileName using an expression that formats the base file name with the location.

Activity: Copy Data

Inside the ForEach loop, use another Copy Data activity to read the CSV file corresponding to the constructed file name. Set the source to the dynamically generated file path.

Activity: Azure Function or Databricks Notebook

If you need to run additional processing with Python, you can use an Azure Function or a Databricks Notebook activity. Pass the data from the previous step as input to your Python code.

Concatenate DataFrames (Optional)

If you are using Databricks, you can concatenate the DataFrames within your Python code after reading all the CSV files.
King Java 790 Reputation points

2024-11-21T00:08:22.2133333+00:00

@phemanth I was able to finish the first part (where it copies data from a csv file to a blob storage), but I am lost on the next steps.

Bottom is screenshot that I have so far:

So, you mentioned about creating "Variable". And you also mentioned about using expression.

I am not sure how to add expression here (inside Variables).

Can you possibly send me screenshots of what you have inside "ForEach1" and "Copy data2"?

If you are good, I would like to have a meeting with you privately to go each step by step or if you have any good YouTube video (I doubt there will be at this details), that would be great.

Bottom is another screenshot of what I have, but I doubt this is correct.

Also, regards to using Azure Function with Python, that is just another option (not required), correct?
phemanth 15,760 Reputation points Microsoft External Staff Moderator

2024-11-21T18:10:23.94+00:00

@I am just asking java programming language

Go to the Variables tab in your pipeline.

Click on New to create a variable.

Give it a name (e.g., currentLocation) and set its type (e.g., String).

Using Expressions

To set the value of a variable using an expression:

In the Set Variable activity, select the variable you created.

In the Value field, you can use an expression. For example, if you want to set it to the current location in a loop, you might use:JSON
@item()
This assumes you’re iterating over a list of locations in a ForEach activity.

ForEach Activity

In your ForEach activity:

Set the Items property to the array of locations you want to iterate over.

Inside the ForEach, add activities like Set Variable to set the current location, and then add your Copy Data activity to copy data for that location.

Copy Data Activity

In the Copy Data activity:

Set the Source to point to your CSV file. You can use an expression to dynamically construct the file path using the variable you set earlier:JSON
@concat('path/to/your/csv/'

Set the Sink to your blob storage.
King Java 790 Reputation points

2024-11-21T23:58:55.7666667+00:00

@phemanth

Thank you for your help.

I am confused over where I should use "Set Variable" because on your first reply, I did not see 'Set Variable' on the images. I am not sure where "Set Variable" should be located.

I am guessing we are still doing "ForEach" and put "Copy data" inside, correct?

Can you possibly post some of images (especially where Variable, expression etc. are placed) so that I could replicate?

Bottom is what I have, but I am not sure I am on the right track or not.

Thank you!
phemanth 15,760 Reputation points Microsoft External Staff Moderator

2024-11-22T16:14:58.7333333+00:00

@King Java

ForEach Activity: This is where you will iterate over your list of locations.

Inside the ForEach:

Set Variable Activity: This should be the first activity inside the ForEach loop. It will set the current location for each iteration.

Copy Data Activity: This should come after the Set Variable activity. It will use the variable set in the previous step to construct the file path dynamically.

Step-by-Step Setup

ForEach Activity

Items: Set this to the output of your previous activity that contains the list of locations (e.g., @activity('Copy Data').output.value).

Inside ForEach

Set Variable:

Name: Create a variable (e.g., currentLocation).

Value: Use the expression @item() to get the current location.

Copy Data Activity:

Source: Use an expression to dynamically construct the file path:JSON @concat('path/to/your/csv/data_', variables('currentLocation'), '_2024.csv') Sink: Set this to your blob storage.
King Java 790 Reputation points

2024-11-22T21:41:50.6033333+00:00

@phemanth

Thank you so much for your help.

I built up to "Set Variable" and did debugging.

Bottom is screenshot of what I built so far:

When I tried with @activity('Copy data1').output.value, it creates this error:

So, I took out the 'value' part, and ran again.

Now, I got this error message:

These are two screenshots for "ForEach" and "Set variable1":

"ForEach"

"Set variable1"

Can you point out what I am missing / need to be corrected?

By the way, the first step (Copy data), just move csv file from original File Storage to a Blob storage.

I am still confused why we need "Copy data" activity at first (what is objective).

Here is requirement:

I have a csv file (let's say "location.csv") that list all 20+ locations (example: California, Arizona etc.) to match the csv file names.

I have a pool of csv files (20+) that have payroll data (each with different locations).

Pipeline will go thru that "location.csv", and if name of location matches to the name of payroll files, those matching payroll csv files would be ingested to Azure SQL.

If this thread is getting too long, I would like to have meeting with you privately to make this work. Thank you.

Answer 1

AnnuKumari-MSFT 34,556 Microsoft Employee Moderator

Hi King Java ,

Thankyou for using Microsoft Q&A platform and thanks for posting your query here.

It seems you want to copy multiple files dynamically using ADF pipeline. I would take the latest points from your followup query to address the requirement:

I have a csv file (let's say "location.csv") that list all 20+ locations (example: California, Arizona etc.) to match the csv file names.
I have a pool of csv files (20+) that have payroll data (each with different locations).
Pipeline will go thru that "location.csv", and if name of location matches to the name of payroll files, those matching payroll csv files would be ingested to Azure SQL.

Since you need to loop through the records in 'location.csv' and loop through each of the payroll file names as well , there is a requirement of nested looping using two level pipelines as we can't use nested looping directly in ADF. So, kindly try the below approach:

Master_Pipeline:

Use lookup activity to retrieve the data from 'location.csv' file.
Use Foreach activity to loop through the lookup output data by mentioning below expression in foreach items:
```
   @activity('Lookup1').output.value
```
Inside foreach, use execute pipeline activity to call a child pipeline

Child pipeline:

Create a pipeline parameter called locations .
Use Getmetadata activity to get all the filenames in your ADLS. Use 'childitems' in the field list.
Use Foreach activity to loop through the getmetadata activity output with this expression:
```
   @activity('Get Metadata1').output.childItems
```
Inside foreach, use if condition to validate if filename contains the location or not using this expression:
```
   @contains(string(item().name),pipeline().parameters.locations)
```
Inside true block, use copy activity by parameterizing both source and sink dataset

In the execute pl activity of master pl, pass the value for parameter as : @item().Location

Below is the pipeline json:

{
    "name": "pipeline2",
    "properties": {
        "activities": [
            {
                "name": "Lookup1",
                "type": "Lookup",
                "dependsOn": [],
                "policy": {
                    "timeout": "0.12:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "source": {
                        "type": "DelimitedTextSource",
                        "storeSettings": {
                            "type": "AzureBlobFSReadSettings",
                            "recursive": true,
                            "enablePartitionDiscovery": false
                        },
                        "formatSettings": {
                            "type": "DelimitedTextReadSettings"
                        }
                    },
                    "dataset": {
                        "referenceName": "DelimitedText1",
                        "type": "DatasetReference"
                    },
                    "firstRowOnly": false
                }
            },
            {
                "name": "ForEach1",
                "type": "ForEach",
                "dependsOn": [
                    {
                        "activity": "Lookup1",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "userProperties": [],
                "typeProperties": {
                    "items": {
                        "value": "@activity('Lookup1').output.value",
                        "type": "Expression"
                    },
                    "isSequential": false,
                    "activities": [
                        {
                            "name": "Execute Pipeline1",
                            "type": "ExecutePipeline",
                            "dependsOn": [],
                            "policy": {
                                "secureInput": false
                            },
                            "userProperties": [],
                            "typeProperties": {
                                "pipeline": {
                                    "referenceName": "pipeline3",
                                    "type": "PipelineReference"
                                },
                                "waitOnCompletion": true,
                                "parameters": {
                                    "locations": {
                                        "value": "@item().Locations",
                                        "type": "Expression"
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        ],
        "annotations": []
    }
}

User's image

Child pipeline: User's image

User's image

Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou

King Java 790 Reputation points

2024-11-25T22:23:43.4833333+00:00
@AnnuKumari-MSFT

Thank you for helping on this issue.

Your architecture makes sense.

But, I am stuck putting all pieces together.

You mentioned about Parameter "location", but what do I put inside "Type" and "Default value"?

You provided JSON codes, and I am wondering if there is a way to use JSON code to create interface/step(s).

Inside "ForEach1", I see there is "Execute Pipeline1". But where does "pipeline3" go?

Regards to two layer pipelines, where does Child pipeline go inside? I am having difficulty putting all pieces together.

So, first pipelines goes from "Lookup1"--> ForEach (and there is "Execute Pipeline1" inside).

Where does "Child pipeline" (that has "Get Metadata" --> "ForEach" (with If condition1)) go?

I am so lost where "pipeline3 > ForEach1" should go.

If there is a way to set up a private meeting, please let me know.

(I also have a Azure subscription under my company if that helps).

Thank you!
AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2024-11-26T04:45:36.8133333+00:00
I am sorry for the confusion, I have renamed my pipelines as master and child pipeline and sharing gif with you for reference.

Addressing your queries:

You mentioned about Parameter "location", but what do I put inside "Type" and "Default value"? -- No need to pass any value , as in execute pipeline activity we are passing @item().locations to make it dynamic

2.You provided JSON codes, and I am wondering if there is a way to use JSON code to create interface/step(s). -- You can create all the datasets with the same name as provided in my json first and then copy paste my json code into a new pipeline json , use same name 'masterpl'

Inside "ForEach1", I see there is "Execute Pipeline1". But where does "pipeline3" go? -- Pipeline3 (now childpl) is getting called via execute pl activity

4.Regards to two layer pipelines, where does Child pipeline go inside? I am having difficulty putting all pieces together. -- master pl is calling child pl using execute pl activity

Sorry , call is not possible. In case you want to connect with specific engineer, you might need support plan and create support ticket with azure.

Thankyou. Kindly accept the answer if it helped.
King Java 790 Reputation points

2024-11-26T16:56:08.5033333+00:00

@AnnuKumari-MSFT

Thank you so much for your help.

I am trying to take each part of your gif animation (by slowing down the file) and take each snapshots to put all pieces together to build pipeline now. It will take some time, so I will get back to you after few days.

I have questions:

What are details inside these areas (Sources and Sink) : "Copy data 1" inside "If Condition1"?

I could see from your gif file.

Thanks.
AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2024-11-27T17:46:42.13+00:00

Hi King Java ,

Sorry for the confusion on that point. The expression I used inside copy activity earlier wouldn't loop until we use @item() function . Kindly use @{item().name}

instead. I modified in my pipeline as well and it worked well. Make sure to select 'auto create table' option in sink setting of copy activity and make both source and sink dataset dynamic by creating parameters

Hope it helps. Kindly accept the answer. Thankyou
King Java 790 Reputation points

2024-11-27T18:34:11.7166667+00:00

@AnnuKumari-MSFT

Thank you.

I have prepared a Google Doc, and I am sharing here.

I am currently stuck on Step 7 (on this Google Doc) where I am not able to have an option of "Child items":

It appears that in order to Execute "Execute Pipeline1", I need to create a "ChildPl" first.

So, I tried creating "ChildPl", but I cannot replicate having "Child items" option.

I have another question. What does "DelimitedText3" contain? Thanks!
ShaikMaheer-MSFT 38,546 Reputation points Microsoft Employee Moderator

2024-12-02T07:28:54.19+00:00

Hi King Java ,

The dataset which you selected in GetMetaData Activity should point to folder. When we point to folder then automatically, we should see childitems option. This gets all the file or folders names from that root folder.

As show in above gifs, DelimitedText3 dataset dynamically points to files inside the folder.

Hope this helps.

----
_If it helps, please consider hitting Accept Answer button. Accepted answers help community as well. Thank you.
Smaran Thoomu 24,260 Reputation points Microsoft External Staff Moderator

2024-12-03T09:09:48.1666667+00:00

King Java Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
King Java 790 Reputation points

2024-12-03T18:20:49.75+00:00
@ShaikMaheer-MSFT Thank you for your feedback. I am trying to understand whether this pipeline makes sense.

Requirement are:

I have a csv file (let's say "location.csv") that list all 20+ locations (example: California, Arizona etc.) to match the csv file names.

I have a pool of csv files (20+) that have payroll data (each with different locations).

Pipeline will go thru that "location.csv", and if name of location matches to the name of payroll files, those matching payroll csv files would be ingested to Azure SQL.

This is where all snapshots for the ADF (link on the bottom as well) is located.

https://docs.google.com/document/d/13kFbZV9Y9gJ8DVM17_TmC1PGYrGN8UDn/edit?usp=sharing&ouid=110812968204213313544&rtpof=true&sd=true

Sorry, I am still not understanding the purpose of "DelimitedText3" part.

After this, I might ask you more questions about expressions presented in Parameter and other areas. Thank you so much!
Smaran Thoomu 24,260 Reputation points Microsoft External Staff Moderator

2024-12-05T14:14:48.5633333+00:00

King Java We are reaching out to the internal team to get more information related to your query and will get back to you as soon as we have an update.
King Java 790 Reputation points

2024-12-05T18:36:22.2166667+00:00
@Smaran Thoomu I was able to test the pipeline that @AnnuKumari-MSFT showed on two gif files, and got the some data inside Azure SQL without error now.

But I still have few more questions like:

Purpose of "DelimitedText3"

Meaning of - @contains(string(items().name,pipeline().parameters.locations)

Meaning of - @{item().name} & why they use { } instead of ( )?

Meaning of - @dataset().filename (part of "DelimitedText3")
King Java 790 Reputation points

2024-12-10T00:05:26.0233333+00:00
@Smaran Thoomu or @AnnuKumari-MSFT ,

I would like to follow up on the 4 questions:

Purpose of "DelimitedText3"

Meaning of - @contains(string(items().name,pipeline().parameters.locations)

Meaning of - @{item().name} & why they use { } instead of ( ) ?

Meaning of - @dataset().filename (part of "DelimitedText3") Thank you!
AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2024-12-10T06:39:37.8733333+00:00
Hi King Java ,

Thankyou for your followup queries and glad to know that the pipeline worked successfully for you. I will try to address your queries here:

Purpose of "DelimitedText3" :

DelimitedText1: Used to point to location.csv file

DelimitedText2: Used to point to 'demo' folder

DelimitedText3: Dynamic dataset , used to iterate through the files present inside Demo folder with the help of parameter called 'filename'

Meaning of - @contains(string(items().name,pipeline().parameters.locations) : We are iterating through the locations present in location.csv file in the master pipeline and passing the location records one by one to the child pipeline using the expression @item().Locations inside execute pipeline activity . Inside the child pipeline, we are having another loop to iterate through the files inside 'demo' folder and checking the filenames contains the location or not. For example: Loop1 in Master pipeline:
Location: London
Child pipeline:
loop1 in childPl:
sales_Argentina.csv : Check performed using this expression in if activity: @contains(string(items().name,pipeline().parameters.locations)
items().name : sales_Argentina.csv doesnt contain 'london' as substring. So if condition didn't match the criteria. Execution went into false block , and loop exited without copying data. loop2 in childPl:
sales_London.csv : Check performed and 'sales_London.csv' filename has the keyword matching 'London'. Execution went into True block , copy activity executed and sales_London table got created and data got copied. sales_California.csv : Check performed, condition didn't match the criteria. Execution went into false block , and loop exited without copying data. Loop2 in Master pipeline:
Location: California Child pipeline: loop1 in childPl: sales_Argentina.csv : Check performed, condition didn't match the criteria. Execution went into false block , and loop exited without copying data. loop2 in childPl: sales_London.csv : Check performed, condition didn't match the criteria. Execution went into false block , and loop exited without copying data. loop3 in childPl: sales_California.csv : Check performed and 'sales_California.csv' filename has the keyword matching 'California'. Execution went into True block , copy activity executed and sales_California table got created and data got copied. Same way, suppose if Argentina location is not present in the location.csv file, it won't run the loop for Argentina. Other scenario is: If Argentina location is present in location.csv file, but no file has the keyword 'Argentina, it will still loop through all the files in demo folder and check if Argentina is present, since it's not present , it will go into False block for all the loops in child pipeline.

Meaning of - @{item().name} & why they use { } instead of ( ) ?
This is called string interpolation in ADF. It converts the expression output into string. You can also use @string() alternatively, it works in same way.

Meaning of - @dataset().filename (part of "DelimitedText3") :
In DelimitedText3, we can't point to a static file to copy from source (demo folder) to destination (SQL db) as we have to copy multiple files, so we need to parameterize the dataset and make it dynamic. So we created a parameter called 'filename' , now where will we use this parameter? Inside the filepath, we need to specify containername, foldername and filename. We can parameterize all 3 values, but we need to parameterize only filename in this case. So we created parameter for filename and in the filepath , just select the parameter, it will automatically create this expression : @dataset().filename . Value for this expression will be passed from the pipeline using @{item().name}.
AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2024-12-10T06:52:03.3133333+00:00
Additionally, you might also wonder why @{item().name} ?? Why not anything else.

In foreach activity, we use some expression to define items , and the output of that expression is always an array(plural), or you can directly pass an array in 'items' of foreach. It wont accept single value as string, single value can be passed using @createarray() function , still it will be an array of one object and loop will run atleast once. It always needs array to loop through.

Items:

@activity('Get Metadata1').output.childItems

What is childItems array:

Now here, we are defining output.childItems array of get metadata as 'items' . So , now ,
items.name= [ 'sales_california.csv', 'sales_london.csv' , ... ] (example)
What does item mean? Item is each of the values of the above array in each iteration.

So , loop1 : item.name='sales_california.csv' , for loop2, item.name='sales_london.csv'

Similarly,
items.type=['File','File','File' ,..]
but we dont require items.type , so we are not using it

Hope it helps . You can watch this playlist for better understanding of ADF scenarios. Thankyou. Please accept the answer if it helped.
King Java 790 Reputation points

2024-12-10T19:23:06.68+00:00

Thank you so much for giving all detailed explanations. @AnnuKumari-MSFT I am still not clear about DelimitedText2: Used to point to 'demo' folder.

Currently, for testing, I have put my actual csv files (different locations) there (DelimitedText2). That is where those files should be placed, correct? I am little bit confused about meaning of 'demo'.
AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2024-12-11T03:21:38.98+00:00

King Java ,

Yes right , I took the reference from my gif which I shared, where the folder name is demo folder. You need to point delimitedtext2 to the location where all the csv files are residing , the source folder.

Let me know if you have any additional queries. Thankyou
King Java 790 Reputation points

2024-12-14T01:01:50.78+00:00
@AnnuKumari-MSFT

Sorry. I have one more question.

If requirement is like below, where and how do I modify the expression?

Let's say I have today's date mentioned inside the list.csv (for Locations):

Example:

Do I modify here and if yes, how?

@contains(string(item().name),pipeline().parameters.locations)

Basically, this is real example that I need to filter by today's date.

I have used expression like this for my original ADF, but this is concat:

@concat('California_' ,formatDateTime(adddays(utcNow(),0),'yyyyMMdd'),'*','.csv')
King Java 790 Reputation points

2024-12-16T22:59:59.7533333+00:00

@AnnuKumari-MSFT Never mind. I was able to get solution:

@and( contains(string(item().name), pipeline().parameters.locations), contains(string(item().name), formatDateTime(adddays(utcNow(),0),'yyyyMMdd')) )
AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2024-12-17T07:02:04.2066667+00:00

King Java ,

Glad to know you were able to figure out the solution. Kindly raise a new thread in case you have any doubt or any further queries. Thankyou
King Java 790 Reputation points

2024-12-17T18:48:09.53+00:00

Thank you so much. I am looking forward to watch some of the YouTube contents (Real Time scenarios) you made to learn.

Answer 2

@King Java

Thanks for using Microsoft Q&A forum and posting your query.

To dynamically read and ingest data from multiple CSV files with similar naming patterns, you can streamline your process significantly. Here’s a structured approach :

Use a List of Locations

Create a CSV file (or a simple list) that contains all the location names (e.g., California, Oregon, Arizona). This will allow you to easily iterate through the locations when constructing your file names.

Construct File Names Dynamically

You can use a programming language like Python to dynamically generate the file names based on the locations. Here’s a simple example using Python:

import pandas as pd
# List of locations
locations = ['California', 'Oregon', 'Arizona']  # Add more locations as needed
# Base file name pattern
base_file_name = "data_{}_2024.csv"  # Adjust the pattern as per your file naming convention
# List to hold DataFrames
dataframes = []
# Loop through each location to read the corresponding CSV file
for location in locations:
    file_name = base_file_name.format(location)
    try:
        df = pd.read_csv(file_name)
        dataframes.append(df)
        print(f"Successfully read {file_name}")
    except FileNotFoundError:
        print(f"File {file_name} not found.")
# Optionally, concatenate all DataFrames into one
all_data = pd.concat(dataframes, ignore_index=True)

Parameterize Your Pipeline

If you are using a data pipeline tool (like Apache Airflow, Azure Data Factory, etc.), you can parameterize the location names. This way, you can pass the list of locations as parameters to your pipeline, which will then construct the file names dynamically.

Error Handling

Make sure to include error handling (as shown in the example) to manage cases where a file might not exist. This will help you avoid breaking your pipeline if one file is missing.

Considerations for Scalability

If you anticipate needing to add more locations or file types in the future, consider storing your location list in a database or a configuration file. This will make it easier to manage and update.

Hope this helps. Do let us know if you any further queries.

Share via

How do I dynamically read multiple file names?

1 additional answer

Your answer