This is a two-part Question.
First Part
Context: I have ADF which contains several data pipelines. Some pipeline also includes a databricks notebook as an activity. I have created Linked Service Azure Data Factory to facilitate the pipeline which requires running the task at the databricks end.
I know that the clusters created at the Databricks end by triggering pipelines from ADF are job clusters and they get terminated immediately once the task at the databricks end is complete.
Currently, I have a single job compute policy defined on the databricks end.
When a temporary job cluster is created in Databricks by the trigger of the ADF pipeline, I need to know the answer to the below questions.
- When a temporary cluster is created at Databricks by the trigger of ADF, will it adhere to the job-compute policy defined inside Databricks?
- When creating a linked service in ADF, ADF allows options to define cluster attributes, like "cluster version", "cluster node type", "workers limit", and "cluster spark configurations". It also provides a field for dynamic JSON content to define cluster attributes. So, how are these attributes, that are defined within ADF Linked Service align with the pre-existing job compute policy attributes in the Databricks? Which of them will take precedence?
Second Part:
Context: The pipelines in ADF have different runtime, based on the runtime, I've classified the pipelines into 3 groups. i.e. short, medium, long. At the Databricks end, I will create the Job-Compute-Policy for these 3 groups. For these 3 different groups, I need to create 3 different Linked Services in ADF.
For example, "ls_databricks_linked_service_short_run_time" will be used by the pipelines that have short runtime. Similarly, each pipeline will be tagged with appropriate linked services as per their runtime.
I need to know the answers to the below questions.
- What necessary changes do I need to make in ADF Linked Service so that it will trigger the creation of an appropriate temporary job cluster in Databricks?
- For example, if a pipeline with a short runtime has triggered the creation of a temporary job cluster in databricks. The cluster that is going to be created must adhere to the job-compute-policy for short runtime jobs which is already mentioned inside the databricks. Similarly, it should happen for different groups. i.e. medium and long.