Data Flows in a daisy chain - TTL

Alan Anscombe 51 Reputation points
2020-07-07T21:32:24.117+00:00

Found a guy on internet with this issue. I have the exact same
**

** “For each pipeline that used data flows to perform data transformations, there’d be a ~6 minute cold-start time where ADF would be “acquiring compute” for an Apache Spark cluster. Azure states in their docs that you can overcome this cold start for down stream tasks by configuring a TTL on the integration runtime but this does not work. We found our pipeline would be cold starting all data flow activities down stream. Microsoft, you really need to fix this!**

Where I work we have this exact same problem. We were forced to move our factory from Aus SE to Aus East as Dataflows are not yet supported in former (confirmed by Microsoft response), so I’m wondering if now, our mixed location subscription is causing trouble?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,121 questions
0 comments No comments
{count} votes

Accepted answer
  1. Mark Kromer MSFT 1,146 Reputation points
    2020-07-08T06:56:43.78+00:00

    Setting a TTL on your Azure IR, then executing your data flows in sequence will reduce your Azure Databricks cluster acquisition time down to 1-2 mins.

    You can execute your Azure Databricks compute from ADF Data Flows in a region that is different from the home region of your factory by setting the region in the Azure IR configuration.


0 additional answers

Sort by: Most helpful