Do Clusters persist between consecutive Pipeline Runs?

DMC 1 Reputation point
2021-01-14T12:45:57.283+00:00

Hi, I've created a pipeline with multiple Dataflows in parallel. Should I consider setting the IR's TTL to a value other than 0? I believe this would only be of benefit if it persists between pipeline runs, Is this the case? The idea being that the next run of the pipeline will make use of the previously created clusters which (if this happens) would cut time down significantly.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,793 questions
{count} votes

1 answer

Sort by: Most helpful
  1. MarkKromer-MSFT 5,196 Reputation points Microsoft Employee
    2021-01-14T17:48:08.15+00:00

    Cluster re-use is determined by the Azure IR configuration. However, only the cluster VM infrastructure is re-used. Every job results in a new Databricks instance, as per Databricks job cluster guidelines. If you are executing dataflows in parallel, then TTL will not have any benefit. TTL will minimize cluster start-up times for subsequent sequential executions because the infrastructure does not need to be re-acquired. Even if you use the same Azure IR configuration in a separate pipeline in the same factory, you will still benefit from this feature.

    0 comments No comments