Azure Data Factory Until loop slow to end/terminate

Geezer 16 Reputation points

We have an until loop in a ADFv2 pipeline.

The time it takes to stop/terminate once the expression condition is met seems to corrolate between the length of time the until loop takes to completes its activities.

This particular until loop performs alot of activites and can take anywhere between 90-120 mins to complete. So it takes almost as long to end/terminate (break out of the loop).

If I "hack" it so that it only performs a handful of activities it will quickly end and break once it's finished and the expression to terminate is met.

It's like a spinning wheel that keeps spinning even after the power is turned off. The momentum that was built up while connected takes a while to slow down and eventually stop.

Is this a known issue, how can I troubleshoot the exact cause here or fix it?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,167 questions
{count} votes

6 answers

Sort by: Most helpful
  1. Wouter Vermeulen 6 Reputation points

    Similar behavior in my project. The activities in an until job took a total of around 15 minutes to reach a state where the until condition was met, then it took another 10-15 minutes after that for the until loop to break and return success.

    1 person found this answer helpful.
    0 comments No comments

  2. Ali Soleyman 5 Reputation points

    I have same issue with Until loop in my project. The activities in an until job took a random time between 5 to 20 minutes to stop while there is no process available after the until condition was met, unfortunately I can't use Foreach loop too as it has another issue when an iteration failed will not stop and continue to process next iteration when I set sequentially runs. This is completely blocked my project. And it's seems nobody cares in Microsoft around this major issue!!!! Can someone from Microsoft help me how can I resolve my issue?

    1 person found this answer helpful.
    0 comments No comments

  3. dvGlenn 0 Reputation points

    This is a VERY real issue and needs some attention from the product team ASAP. It is threatening DataFactory as an enterprise tool .

    I work with about 6 organizations using Data Factory to load AzureSQL/Managed Instance. All of my clients are observing similar behavior since about Q3 or 2022. MSFT has not been responsive to the concerns. Managed Instance clients seem to be the most impacted. Managed VNETs may be the culprit here.

    In addition to UNTIL control structures, we are experiencing the issue with a FOR loop that also runs for a bit and includes stored procedure execution. I think the actual culprit is the VNET and SQL connector not bubbling "end-of-batch" events up the stack. I currently suspect that the bug is in the SQL connector/stored procedure activity and having several of those active in a loop takes a while to resolve all the statuses.

    Data Factory performance is erratic and gradually getting slower. I assume there are scalability issues or other limits inherent in the platform design that are now showing up as adoption improves and utilization % increases on the supporting infrastructure.

    Other significant performance concerns have emerged in the past few months as well:

    We have recently observed (and documented with MSFT support tickets), where the same SP runs in 1-2 min using a direct connection to the DB through SSMS/data studio however runs for hours via DataFactory (likely the same issue).

    I have been and continue to be a believer in the Data Factory tool. It's architecture is (from outward appearance) ideal for the Azure environment. It needs to be supported and work correctly. It is a CRITICAL part of the Azure stack.

  4. Christo Greeff 6 Reputation points

    We're seeing similar behavior. The UNTIL activity runs for 20-25 minutes and then we're seeing a delay of 15 mins before the next activity continues. 15min gap between last activity inside the UNTIL, and the start of the IFCONDITION activity after OnSuccess.

    User's image

    What are the cost implications here? Do we need to escalate this @ShaikMaheer-MSFT ?

    0 comments No comments

  5. Viguro 1 Reputation point

    We (partially) solved the problem as follows: instead of retrieving the job status every 10 seconds, we retrieve it every 5 minutes. In this way, we drastically limit the number of loops and have reduced the "dead time" from 30 minutes to less than 2 minutes for an 8-hour process.