Azure Data Factory - DataFlow TTL

jigsm 231 Reputation points
2021-05-09T22:10:30.613+00:00

Respected,

We are using ADF to consume a CSV file into SQL DB. Apart from this, we are also using DataFlow to implement some transformation.

All imports are taking around 5-7 mins and we are aware that the DataFlow needs to spin an Apache Spark cluster.

We are expecting a continuous inflow of files and cannot afford 5-7 minutes for each import.

We are exploring TTL and the maximum value that can be set is 4 hours.

Please can you let us know which value to set if we want to run this continuously for 24 hours?

95056-image.png

Regards

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
6,295 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Mark Kromer MSFT 1,131 Reputation points
    2021-05-10T06:43:07.09+00:00

    The Azure IR TTL sets a time-to-live for idle cluster time after your last job executes on the Azure IR. If you are expecting files to continuously flow in, I would suggest to start with the lowest TTL, 5 minutes. That way, the cluster will always be available to execute your data flow jobs. You would only set a high TTL value, i.e. 4 hours, if you want the cluster to stick around to minimize the start-up of a job when you have a large gap in execution times.

    No comments