Cluster re-use is determined by the Azure IR configuration. However, only the cluster VM infrastructure is re-used. Every job results in a new Databricks instance, as per Databricks job cluster guidelines. If you are executing dataflows in parallel, then TTL will not have any benefit. TTL will minimize cluster start-up times for subsequent sequential executions because the infrastructure does not need to be re-acquired. Even if you use the same Azure IR configuration in a separate pipeline in the same factory, you will still benefit from this feature.
Do Clusters persist between consecutive Pipeline Runs?
DMC
1
Reputation point
Hi, I've created a pipeline with multiple Dataflows in parallel. Should I consider setting the IR's TTL to a value other than 0? I believe this would only be of benefit if it persists between pipeline runs, Is this the case? The idea being that the next run of the pipeline will make use of the previously created clusters which (if this happens) would cut time down significantly.
1 answer
Sort by: Most helpful
-
MarkKromer-MSFT 5,216 Reputation points Microsoft Employee
2021-01-14T17:48:08.15+00:00