I'm wrapping up a support case with Microsoft related to the performance of ADF. To be more specific , this is related to the performance of a sequence of interactions with Azure SQL. We find that ADF introduces a ton of overhead when performing a sequence of activities. For example, 100 individual activities could add ~10-15 mins of unexpected overhead. The additional delays are taking place within the internal workings of the ADF IR.
In my phone call with Microsoft, it was explained that the overhead in the ADF IR is typically attributed to "task pickup time". The architecture consists of an asynchronous queue that needs to be polled in order for the IR to pick up each individual task.
Unfortunately for a large pipeline, this "task pickup time" can be a substantial percentage of the overall execution. In the example given by @MartinJaffer-MSFT
above, the task pickup time added 10 mins on top of the "real" work - that was only supposed to take one second.
Ideally there would be a way to configure/tune the "task pickup time" - especially for an on-prem self-hosted IR. However Microsoft said there is no supported mechanism for configuring the IR (either on -prem or in azure).
Barring the ability to configure/tune the IR, it would be nice if ADF would at least give us some additional visibility or metrics. ADF should indicate how much of the overall time is spent on "task pickup". If a pipeline takes an extremely long time, customers should be able to determine whether the problem is in their own code, or if the problem is an unavoidable consequence of using ADF.
As it turns out, there is supposedly an "SLA" for ADF when it interacts with SQL. I still have to find a formal reference for this, but I'm told that ADF is supposed to be allowed to contribute an additional four minutes of its own overhead! Ie. according to the SLA, it’s acceptable for ADF to introduce up to four minutes of its own overhead, for a every interaction with SQL - even when a stored proc can be otherwise executed in one second. To me that seems like an extremely lenient SLA and, if it came to that, I'm sure most customers would be very unhappy with a four minute delay for each SQL activity. Even a 5 second delay for every nested pipeline can be a problem, when it accumulates within a sequence of other activities.