Avoid sequential yet capture values in For Each

Question

Avoid sequential yet capture values in For Each

sam nick 366

My pipeline goal, is to read the data from the csv and for each row, execute the url and capture the result of that url. The reason, i have to do this sequentially is to name the file as the urlid and directory based of year and month and day, So the way i do this, have a lookup for the csv and then pass the value to For Each loop and then create variables and assign each variable to each column and have the For Each execute sequentially so for every response from the urllink, i can capture its urlid and other values for year, month, day.

Unfortunately, the csv file will have 10k rows and executing for 10k is extremely time consuming.

Any recommendations on how to execute this parallelly yet be able to capture the corresponding urlid and other values.

User's image

Subashri Vasudevan 11,306 Reputation points Volunteer Moderator

2023-05-06T12:09:33.4266667+00:00

Hi @sam nick

If you use look up activity to get the file data, it will only fetch 5k records, are you using parent-child pipelines to get all the rows?

2 answers

Your answer

Subashri Vasudevan 11,306 Reputation points Volunteer Moderator

2023-05-06T12:09:33.4266667+00:00

Hi @sam nick

If you use look up activity to get the file data, it will only fetch 5k records, are you using parent-child pipelines to get all the rows?

Answer 1

Bruce (SqlWork.com) 84,086

see the Task Parallel Library:

https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/task-parallel-library-tpl

0 comments

Answer 2

VasimTamboli 5,550 MVP

One possible solution is to use Azure Batch to execute the requests in parallel. You can create a Batch pool and add a Batch task for each row in the CSV file. Each task can execute the URL request and store the result in a shared location, such as Azure Blob Storage. Once all tasks are complete, you can retrieve the results from Blob Storage and process them to generate the desired output files.

Another option is to use Azure Databricks to parallelize the execution. You can read the CSV file into a DataFrame and then use the map function to execute the URL request for each row in parallel. The map function will return a new DataFrame with the results, which you can then process to generate the output files.

Both of these solutions should be much faster than executing the requests sequentially in a For Each loop, and should allow you to capture the corresponding urlid and other values for each request.

Bhargava-MSFT 31,361 Reputation points Microsoft Employee Moderator

2023-05-15T22:08:35.0833333+00:00

Hello sam nick,

I am checking to see if you have any further questions here.

Share via

Avoid sequential yet capture values in For Each

2 answers

Your answer