I have the same problem. I'm running the pipeline in debug mode using the data flow debug session. The cluster startup time is 39s and the Data flow has a processing time of 55s. Still the Data flow pipeline activity takes 6 min to complete. Where does this overhead come from and how can I influence it?
Dataflow takes long time
atarantin
1
Reputation point
Hello,
I have a a pipeline with 1 dataflow. It takes between 2 and 3 minutes to run it and I do not understand why it takes so much time.
For instance, a run took 2 minutes 05.
The data flow inside it took 02 minutes 04.
There are 5 steps in my dataflow:
- Take files from an azure storage container
- Flatten the files
- I do a derived column
- I set the upsert for the database
- I insert data to cosmos db
If I take a look to the detailed metrics, each step of the dataflow took 10s (with a cluster startup time to 1s 479ms for the dataflow).
I am already using an integration runtime with a TTL.
So do you have an idea on how my pipeline/dataflow can be so long while the detailed dataflow metrics are so short?
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,655 questions