Azure Data Factory - ForEach loop slow performance towards end of loop

Question

Azure Data Factory - ForEach loop slow performance towards end of loop

CW 6

I'm working with Azure Data Factory v2, using a Lookup task against an Azure SQL DB which returns a set of results with two columns from a table (table1 in this example)

In this case the lookup returns these two columns:
OrderKey - This is the unique key reference of a particular order
JSON - This is a JSON payload that has been created by a previous task in another pipeline.

In testing, I'm only using ~800 rows but in production it will likely be over 100k rows.

Both of these columns are then used in the ForEach loop.

Inside the ForEach loop we use the JSON to send to a determined web endpoint, return the result and update table1 with the response for that OrderKey.

The web request should take no more than 5-6 seconds to return which is the case in the first few loops, the update to the table should also be quick and takes 5-6 seconds as well. In reality this ADF being slow, with TSQL we can perform this action quicker but we need ADF to get the result of the web request.

The process works, and is very fast in the first few loops.
After that, it seems to take an incredibly long time to actually perform the actions inside the foreach loop; particularly towards the end of the loop when there are less than 20 records left to process. When this happens each loop takes upwards of 3 minutes so the end result is the total pipeline takes far longer than it should..

Some actions I've tried to work around or improve the performance:

Change the integration runtime to use a higher number of cores (up to 256), this results in the same performance problem
Change the integration runtime compute to be memory optimized instead of general purpose, same performance problem (notably there is not a lot of documentation about what this compute change actually does other than "try it and see if its faster...")
Change the SQL DB to a higher tier, I've gone up to Business Critical v5 gen 12, this results in the same performance problem
Change the ForEach to Sequential, this results in far worse performance as looping through more rows one by one is slower
Change the batch size of the ForEach to 50, this results in the same performance problem
Put the ForEach into its own pipeline and put that pipeline inside an Until, setting the lookup to only bring back the first 50 and the Until to continue until we've updated all rows. This is slower again as we're batching through 50 at a time.
Changing the timeout of the lookup tasks performing the SQL DB table update to 10 seconds and the Query Timeout to 1 minute (as low as it will go) with retries enabled. This has proven some strange behaviour - often I'll see the the lookup doing the update task sit there for several minutes without doing anything.
Change the isolation level on the lookup task to ReadUncommitted and using table hints WITH(UPDLOCK, READPAST, ROWLOCK), same performance problem.

Looking at the SQL DB, the CPU Core % used is not even reaching 1%, if I use an S12 instead of a BC_Gen15_12 then I can see the DTU % used is also not reaching 1%.

I'm running out of options here and have no idea why the foreach inside the pipeline is so slow when it gets towards the end. It performs incredibly well leading up to the end, but then takes a nosedive in performance.
The best performance I've had so far is 800 records being updated in about 10 minutes, I'd expect that to be significantly quicker - more like 2-3 minutes.
Based on tests I'm pretty sure the bottleneck is ADF, not the SQL DB or the Integration Runtime.

What else can I try to improve performance here?

Some screenshots to help anyone who has any thoughts. Note that for further testing I removed the web request, the same problem still exists even if we're just trying to update the SQL DB table without the web request.

1 answer

Your answer

Answer 1

AnnuKumari-MSFT 34,556 Microsoft Employee Moderator

Hi @CW ,
Welcome to Microsoft Q&A platform and thanks for posting your query here.
Summarizing your query here - It looks like you are facing performance issue in your ADF workflow. You tried various things to figure it out like , upgrading IR and SQL DB to higher available versions , trying different pipeline structure to make it faster. But still you didn't see any improvement in the performance.

As I went through the details, it seemed that the issue is not with IR or SQL DB versions. You are trying to run multiple DML queries- in this case Update queries at the same time as the lookup which is trying to update the table is within foreach. With the increase in the number of concurrent queries running in the database , it might have blocked the resource class and in turn one query is blocking other queries. Below is the image showing max concurrent queries allowed based on service tier.

The approach I would like to suggest here is instead of lookup activity which you are using to update the table for each item at the same time , it's better to use

copy activity to copy the output content of Web request (the one which you are trying to update in the table) into a dummy/stage table. Keep it a truncate and reload table for every transaction. It will bring the required data all at once in your database.
Then, use stored procedure activity or Script activity to join this dummy table with your original table to update the needed fields based on the OrderKey.

This will replace the massive parallel update queries with a one-time update and will enhance the performance.

Hope this will help. Please let us know if any further queries.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you.
Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2022-03-15T09:49:47.9+00:00

Hi @CW ,
Just checking in to see if the above answer helped. Please do consider clicking Accept Answer and Up-Vote for the same as accepted answers help community as well. If you have any further query do let us know.
AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2022-03-16T06:20:30.247+00:00

Hi @CW ,
Following up to see if the above suggestion was helpful, kindly do click Accept Answer and Up-Vote for the same. In case you have any further query please do let us know.
CW 6 Reputation points

2022-03-16T15:22:11.303+00:00
Thanks for coming back to me, to address the ideas you've raised:

You are suggesting the SQL DB service tier may be causing problems as it cannot cope with the volume of DML requests
Unfortunately I have tried scaling the SQL DB up to a Business Critical Gen 5 12 vCore configuration which should be able to cope with 1200 processes at once: https://learn.microsoft.com/en-us/azure/azure-sql/database/resource-limits-vcore-single-databases#business-critical---provisioned-compute---gen5
The DML processes locking each other was my first thought as well but checking the active queries on the SQL DB while the process was running revealed that the queries weren't even running from ADF when they wait at the end of the foreach loop. They were simply sat at "In progress" and nothing was running.

You suggest creating a dummy/stage table for each transaction using a copy task and then joining them to my original table to do an update.
This is more DML requests than an update as we'd be doing a CREATE and INSERT - potentially to a dynamic table name.
If the table name was dynamic or if not - how would you suggest assigning each web activity to a new staging table in ADF? The queries are dynamic and different every day and I need to assign metadata in the copy activity, so how would I approach this?

Finally doing an update with 100k stage tables to a single table sounds particularly tricky, especially if the tables were dynamic. If we run the updates one at a time using a foreach - the same problem remains: the foreach slows down at the end of the loop.
AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2022-03-29T05:19:53.81+00:00

Hi @CW ,
Apologies for delay in response. My suggestion was to have single stage table to store the web content output and then use the same for updating the original table. Using copy activity would allow to increase DIUs and DOPs which would be a faster option.

Moreover, it would be helpful if you could share the output of the first look up as I feel you can directly copy the content into your SQL without having ForEach and do the update using some stored procedures in SQL engine itself. Thereby, making it more performant.

If you already got the resolution , please do share that same with the community as it can be helpful to others . If the above suggestion was helpful, kindly do click Accept Answer and Up-Vote for the same..
CW 6 Reputation points

2022-07-04T13:41:42.523+00:00

Apologies for my much more delayed response! I've still not found a good solution to this using data factory. No matter what I do, the behaviour is the same where the ForEach loop is incredibly slow towards the end.

The only way around it I've found is to use a function app with asynchronous python which makes several thousand requests in quick loops then logs it all to a SQL DB using a dataframe. Not ideal but I've worked around this issue in ADF for now.

It's a great shame the foreach loop doesn't seem to handle this use case adequately.

Share via

Azure Data Factory - ForEach loop slow performance towards end of loop

1 answer

Your answer