You can utilize the Databricks "warm pool" concept here that will retain the cluster infrastructure by setting a TTL on the Azure Integration Runtime. That will reduce the startup time from 5 mins to 2 mins. Create a new Azure IR, open the Data Flow properties and set a TTL value of 10 mins. Then, in the pipeline, change the Azure IR to the new IR you just created in the Execute Data Flow activities. Each activity must use that same Azure IR in order to take advantage of this warm pool capability.
Azure Data Factory - Data Flow Startup time.
Jhonatan Reyes
61
Reputation points
Hi,
Currently, I have 2 Dataflow in sequence put in a pipeline.
like:
I want to reduce the Startup cluster time.
I know that is normal if the Startup takes between 5-8 min, but is possible to have the same cluster for both Dataflows?
Accepted answer
-
MarkKromer-MSFT 5,211 Reputation points Microsoft Employee
2021-01-06T23:36:48.597+00:00