ADF Dataflow cluster start time

Question

ADF Dataflow cluster start time

Søren Callesen 1

I have a rather complex dataflow, which takes around 4 minutes to complete - but the start up time of the spark cluster, can range from minutes to hours(Seems random)
Example:

Can understand that it usually takes 4-5 minutes... whats going on with a 156 minutes startup time?
Its run in debug mode, on a medium cluster size and have up to 10 parrallel dataflows running.

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2022-09-01T22:05:58.11+00:00

Hello @Søren Callesen ,

Thanks for the question and using MS Q&A platform.

As per my understanding your data flow cluster time is inconsistent and sometimes ranging from minutes to hours and would like to know what's the root cause.
As the issue is not consistent, it will be hard to predict the root cause. Hence, would recommend you to please file a support ticket for deeper analysis. In case if you don't have a support plan, please let me know here.

Thanks
KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2022-09-05T23:56:00.797+00:00

Hello @Søren Callesen ,

Just checking to see if you have got a chance to file a support ticket. If so, could you please share the support ticket number. In case if you don't have a support plan do let me know. Thanks
Søren Callesen 1 Reputation point

2022-09-06T10:03:49.337+00:00

Comment got created as an answer - please read that.

Br.
Søren

1 answer

Your answer

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2022-09-01T22:05:58.11+00:00

Hello @Søren Callesen ,

Thanks for the question and using MS Q&A platform.

As per my understanding your data flow cluster time is inconsistent and sometimes ranging from minutes to hours and would like to know what's the root cause.
As the issue is not consistent, it will be hard to predict the root cause. Hence, would recommend you to please file a support ticket for deeper analysis. In case if you don't have a support plan, please let me know here.

Thanks
KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2022-09-05T23:56:00.797+00:00

Hello @Søren Callesen ,

Just checking to see if you have got a chance to file a support ticket. If so, could you please share the support ticket number. In case if you don't have a support plan do let me know. Thanks
Søren Callesen 1 Reputation point

2022-09-06T10:03:49.337+00:00

Comment got created as an answer - please read that.

Br.
Søren

Answer 1

Hello KranthiPakala,

Thanks for your response.
I dont have any administrative privileges in our Azure environment, so cant create any ticket.
But maybe you can answer my question regardless, since its properly down to me not understanding the logging information provided in adf.

The platform administrators believes its due to the sink not being able to deliver data to the database.
The log/debug view of the dataflow is getting weird to me, since Im not certain what to look after and I cant explain why I dont believe its down to the sink, based on the other logging information provided in the dataflow.

When the dataflow is running, it goes into the queue as the first thing.
Now and then the dataflow doesnt seems to get past this state and times out before the Queue status changes in the initial sink logging view and the acquiring compute doesnt seem to end, before the timeout in this scenario. This offcourse never happens when I try to showcase the issue.

When/if a dataflow completes, all the transformations have time spend, but its doesnt add up to the dataflow execution time in the pipeline.
When looking specific at the sink, the time is longer than any other of the logging information in the dataflow, but matches the time logged in the pipeline calling the dataflow.
The dataflow seems to be able to timeout, if the database operation takes longer, than the cluster inactivity time.

To recap - what does the 156 minutes cluster start up time actually mean, based on the not so clear overview logging information - is it how far the dataflow have gotten before timing out to a database issue or is it really only the wait time before the cluster has been started up.

Is the overview logging information based on anything or do you have to go through each transformation group in order to see execution times?
It seems counter intuitive to me, that the sink group in the stages logging shows 8 secs and the initial sink overview show all success and the same time as in the stages.
But when clicking on the end sink, its shows 20 minutes processing time, way longer than any time in the overview logging information in the dataflow.

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2022-09-08T08:16:42.387+00:00

Hello @Søren Callesen ,

Thanks for your response.

Is the overview logging information based on anything or do you have to go through each transformation group in order to see execution times? - Yes to get better understanding of which transformations or at which point of your data flow is taking more time, you will have to go through each transformation in the flow and narrow down the root cause.

But when clicking on the end sink, its shows 20 minutes processing time, way longer than any time in the overview logging information in the dataflow. - If it's at the sink and taking longer time, you may also need to investigate if there is any other process having an active operation on the same sink which might cause such performance issues. But to be more accurate, I would suggest working with your Azure Admins to file a support ticket for deeper investigation. In case if you don't have a support plan, please let me know I can check on alternate options to make progress on this.
Søren Callesen 1 Reputation point

2022-09-08T08:20:55.44+00:00

Thanks for your reply.

There is only 1 sink the dataflow but I run multiple workflows in parallel, using a for each in the calling pipeline.
Does that count as the same sink?

I will raise the issue with the admin team and go to microsoft support through them.
Again, thanks for the reply!

Share via

ADF Dataflow cluster start time

1 answer

Your answer