Discrepancy in ADF Data flow Activity Execution Time and Sink Processing Time in ADF Jobs

Question

Discrepancy in ADF Data flow Activity Execution Time and Sink Processing Time in ADF Jobs

Amit Kumar 0

I've been observing a significant time difference between the execution duration of my Azure Data Factory (ADF) dataflow activities and the actual completion time of the sink processing including Cluster startup time within the activity. For instance, in a recent example, the dataflow activity started at 11:00:07, and its duration was recorded as 1 minute and 28 seconds, indicating an end time of 11:01:35. However, while monitoring the data flow details, I noticed that the sink processing had already completed within 1 sec 301 ms with cluster startup time of 1 s 263 ms.

The below snapshot shows the Dataflow status as “Success” at 11:00:29 but the actual pipeline is still in process.

Given this discrepancy, I am curious to understand why the dataflow activity remains in progress at the pipeline level even after the sink processing has been completed. This issue becomes particularly pertinent as I am running multiple pipelines with Data flows in my project, leading to a substantial accumulation of time differences. I am utilising a managed vnet IR with a memory-optimised 16 (+16 driver cores) and have set the Time To Live to 30 minutes.

I would appreciate any insights or guidance on potential causes for this discrepancy and any recommendations on how to address it effectively.

Thanks,

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2023-11-13T23:07:58.7233333+00:00

@Amit Kumar We still have not heard back from you. Just wanted to check if the below information was helpful? If it answers your query, please do click Accept Answer and Yes for "was this answer helpful", as it might be beneficial to other community members reading this thread. And, if you have any further query do let us know.

Thank you
Konstantinos Passadis 19,586 Reputation points MVP

2023-12-16T23:42:34.3133333+00:00

Hello @amit kumar !

Do you have any feedback on your issue?

Kindly mark any answer that helped you as Accepted and Upvote or post your feedback to provide additional help!

Regards
Konstantinos Passadis 19,586 Reputation points MVP

2024-01-19T00:23:01.8133333+00:00

Hello @amit kumar ! Do you have any feedback on your issue? Kindly mark any answer that helped you as Accepted and Upvote or post your feedback to provide additional help! Regards

1 answer

Your answer

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2023-11-13T23:07:58.7233333+00:00

@Amit Kumar We still have not heard back from you. Just wanted to check if the below information was helpful? If it answers your query, please do click Accept Answer and Yes for "was this answer helpful", as it might be beneficial to other community members reading this thread. And, if you have any further query do let us know.

Thank you
Konstantinos Passadis 19,586 Reputation points MVP

2023-12-16T23:42:34.3133333+00:00

Hello @amit kumar !

Do you have any feedback on your issue?

Kindly mark any answer that helped you as Accepted and Upvote or post your feedback to provide additional help!

Regards
Konstantinos Passadis 19,586 Reputation points MVP

2024-01-19T00:23:01.8133333+00:00

Hello @amit kumar ! Do you have any feedback on your issue? Kindly mark any answer that helped you as Accepted and Upvote or post your feedback to provide additional help! Regards

Answer 1

Hello @amit kumar !

Welcome to Microsoft QnA!

When we execute Data Flow , Pipelines etc , the whole process contains a lot of other sub tasks :

Preparation meaning parsing the data flow, resolving dependencies, and preparing the execution plan.

Compute - Resources: it may take some time to start up the necessary resources. This is especially true if you're using on-demand compute resources, which can have a significant start-up time.

Execution: the actual running of the data flow, including source data retrieval, transformations, and finally sinking the data.

Resources Turning off: After execution, if the Time To Live (TTL) for the cluster has expired, or if the cluster is not set to remain active, it will be shut down.

Post-Processing: meaning logging, updating ADF metadata, and other necessary clean-up tasks

In fact the execution time for the sink in the data flow is a fraction of the overall process

I suggest to have a look

https://learn.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime-performance

And this great article :

https://mrpaulandrew.com/2019/12/18/best-practices-for-implementing-azure-data-factory/

I hope this helps!

Kindly mark the answer as Accepted and Upvote in case it helped!

Regards

Share via

Discrepancy in ADF Data flow Activity Execution Time and Sink Processing Time in ADF Jobs

1 answer

Your answer