How does ADF Linked Service for Azure Databricks aligns with Job Compute Policy defined inside Azure Databricks

Ravineesh 40 Reputation points
2024-07-10T10:25:16.91+00:00

This is a two-part Question.

First Part

Context: I have ADF which contains several data pipelines. Some pipeline also includes a databricks notebook as an activity. I have created Linked Service Azure Data Factory to facilitate the pipeline which requires running the task at the databricks end.

I know that the clusters created at the Databricks end by triggering pipelines from ADF are job clusters and they get terminated immediately once the task at the databricks end is complete.

Currently, I have a single job compute policy defined on the databricks end.

When a temporary job cluster is created in Databricks by the trigger of the ADF pipeline, I need to know the answer to the below questions.

  1. When a temporary cluster is created at Databricks by the trigger of ADF, will it adhere to the job-compute policy defined inside Databricks?
  2. When creating a linked service in ADF, ADF allows options to define cluster attributes, like "cluster version", "cluster node type", "workers limit", and "cluster spark configurations". It also provides a field for dynamic JSON content to define cluster attributes. So, how are these attributes, that are defined within ADF Linked Service align with the pre-existing job compute policy attributes in the Databricks? Which of them will take precedence?

Second Part:

Context: The pipelines in ADF have different runtime, based on the runtime, I've classified the pipelines into 3 groups. i.e. short, medium, long. At the Databricks end, I will create the Job-Compute-Policy for these 3 groups. For these 3 different groups, I need to create 3 different Linked Services in ADF.

For example, "ls_databricks_linked_service_short_run_time" will be used by the pipelines that have short runtime. Similarly, each pipeline will be tagged with appropriate linked services as per their runtime.

I need to know the answers to the below questions.

  1. What necessary changes do I need to make in ADF Linked Service so that it will trigger the creation of an appropriate temporary job cluster in Databricks?
    1. For example, if a pipeline with a short runtime has triggered the creation of a temporary job cluster in databricks. The cluster that is going to be created must adhere to the job-compute-policy for short runtime jobs which is already mentioned inside the databricks. Similarly, it should happen for different groups. i.e. medium and long.
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,080 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,196 questions
0 comments No comments
{count} votes

Accepted answer
  1. Aravind Nuthalapati 150 Reputation points Microsoft Employee
    2024-07-10T20:16:41.1566667+00:00

    Hello Ravineesh,**
    First Part:**

    1. There are 3 types of options in ADF Databricks Linked Service. When you use existing Job cluster it will just try to use same cluster to execute your ADF Databricks activity.
    2. It is also same as it depends on which cluster type you use in ADF Databricks Linked Service. If you use new cluster, then during ADF job run time it tries to spin up new Databricks cluster based on the configuration you have defined in the ADF Linked Service.

    For more details, please refer to this document and you can play around with this by choosing different options to give better understanding.

    https://learn.microsoft.com/en-us/azure/data-factory/compute-linked-services#azure-databricks-linked-service

    Second Part:

    1. You can choose cluster ID details for existing cluster when you are creating different types of Linked Services like small, medium, large so it will use those clusters during the execution of ADF pipeline jobs.

0 additional answers

Sort by: Most helpful