Azure Data Factory - Data Flow Startup time.

Jhonatan Reyes 61 Reputation points
2021-01-06T21:19:20.167+00:00

Hi,

Currently, I have 2 Dataflow in sequence put in a pipeline.

like:
54128-image.png

I want to reduce the Startup cluster time.

I know that is normal if the Startup takes between 5-8 min, but is possible to have the same cluster for both Dataflows?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,841 questions
0 comments No comments
{count} votes

Accepted answer
  1. MarkKromer-MSFT 5,211 Reputation points Microsoft Employee
    2021-01-06T23:36:48.597+00:00

    You can utilize the Databricks "warm pool" concept here that will retain the cluster infrastructure by setting a TTL on the Azure Integration Runtime. That will reduce the startup time from 5 mins to 2 mins. Create a new Azure IR, open the Data Flow properties and set a TTL value of 10 mins. Then, in the pipeline, change the Azure IR to the new IR you just created in the Execute Data Flow activities. Each activity must use that same Azure IR in order to take advantage of this warm pool capability.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.