Dataflow takes long time

atarantin 1 Reputation point
2021-08-13T15:56:17.547+00:00

Hello,

I have a a pipeline with 1 dataflow. It takes between 2 and 3 minutes to run it and I do not understand why it takes so much time.

For instance, a run took 2 minutes 05.

The data flow inside it took 02 minutes 04.

There are 5 steps in my dataflow:

  1. Take files from an azure storage container
  2. Flatten the files
  3. I do a derived column
  4. I set the upsert for the database
  5. I insert data to cosmos db

If I take a look to the detailed metrics, each step of the dataflow took 10s (with a cluster startup time to 1s 479ms for the dataflow).

I am already using an integration runtime with a TTL.

So do you have an idea on how my pipeline/dataflow can be so long while the detailed dataflow metrics are so short?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,655 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Jesse 1 Reputation point
    2023-05-02T09:57:29.2633333+00:00

    I have the same problem. I'm running the pipeline in debug mode using the data flow debug session. The cluster startup time is 39s and the Data flow has a processing time of 55s. Still the Data flow pipeline activity takes 6 min to complete. Where does this overhead come from and how can I influence it?


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.