How to configure ADF pipeline run, linked service, so it uses Databricks serverless compute

Krzysztof Przysowa 20 Reputation points
2024-05-01T12:12:06.9033333+00:00

Databricks has recently announced serverless compute for workflows:

https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/run-serverless-jobs

I would like to be able to execute Azure Data Factory (ADF) jobs using this functionality.

Currently, for job compute I have to specify driver and worker type, with serverless it is not needed.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,526 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,639 questions
{count} votes

4 answers

Sort by: Most helpful
  1. phemanth 15,765 Reputation points Microsoft External Staff Moderator
    2024-05-01T13:18:49.86+00:00

    @Krzysztof Przysowa

    Thanks for using MS Q&A platform and posting your query.

    Serverless compute for workflows allows you to run your Databricks job without configuring and deploying infrastructure With serverless compute, you focus on implementing your data processing and analysis pipelines, and Databricks efficiently manages compute resources, including optimizing and scaling compute for your workloads

    To configure your Azure Data Factory (ADF) pipeline to use Databricks serverless compute.

    Here are the steps to configure an existing job to use serverless compute:

    1. Create a Linked Service for Databricks: On the ADF home page, switch to the Manage tab in the left panel. Select Linked services under Connections, and then select + New. In the New linked service window, select Compute > Azure Databricks, and then select Continue.
    2. Configure the Linked Service: In the New linked service window, complete the following steps1:
      • For Name, enter AzureDatabricks_LinkedService.
      • Provide the necessary details for your Databricks workspace, such as the URL and access token.
    3. Configure the ADF Pipeline: When creating or editing a pipeline in ADF, you can specify the Databricks linked service as the compute environment for your activities
    4. Parametrize the Spark Configs: If you want to parametrize the spark config values as well as keys, you can do so when writing an ARM template for Data Factory. In the “Microsoft.DataFactory/factories/linkedservices” resource, you can define the newClusterSparkConf.
    5. Use Serverless Compute with Databricks Jobs: To learn more about using serverless compute with your Azure Databricks jobs, you can refer to the official documentation
    6. Open the job you want to edit.
    7. In the Job details side panel click Swap under Compute.
    8. Click New, enter or update any settings, and click Update.
    9. Alternatively, you can click in the Compute drop-down menu and select Serverless.

    please go through the link for more details:https://docs.databricks.com/en/workflows/jobs/run-serverless-jobs.html

    Please note that your Databricks workspace must have Unity Catalog enabled and your workloads must support shared access mode.. Also, your Azure Databricks workspace must be in a supported region.

    You can also automate creating and running jobs that use serverless compute with the Jobs API, Databricks Asset Bundles, and the Databricks SDK for Python.

    please refer.

    https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/run-serverless-jobs

    https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/use-compute

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


  2. PRADEEPCHEEKATLA 90,646 Reputation points Moderator
    2024-05-09T05:44:23.0466667+00:00

    @Krzysztof Przysowa - Thanks for the question and using MS Q&A platform.

    Here is an update from internal team:

    The only way to make it work would be to use the Databricks REST API with ADF's web activity.

    For more details, refer to Azure Databricks REST API -Jobs API 2.0 and Web activity in Azure Data Factory and Azure Synapse Analytics.

    Here is an third-party which explains on how to run databrick rest api with ADF web activity: Azure Data Factory integration with Databricks Workflows.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


  3. Yunpeng Tang 0 Reputation points Microsoft Employee
    2025-07-03T04:28:23.1366667+00:00

    Whether ADF and Databricks Jobs use Serverless resources is unrelated. The following are the operational steps:

    1. Create an interactive cluster in Databricks and stop it.
    2. In ADF, create a Linked Service object for Databricks using the interactive cluster method (this is only needed to enable API authentication for ADF to trigger Databricks Jobs later).
    3. Create a Job in Databricks as usual, and configure the Task within the Job to use Serverless resources under Job Compute.
    4. In ADF, add a Databricks Job activity. In the configuration, select the Linked Service created in Step 2 and choose the Databricks Job name or Job ID configured in Step 3.

    By triggering the pipeline configured above, you will see that the Databricks Job is successfully scheduled. At the same time, the previously referenced interactive cluster will not be started. Upon further observation, you’ll notice that the entire process works as follows: ADF uses the Linked Service to authenticate and trigger the Databricks Job in the background, and then the Job itself initiates the Serverless compute to carry out the subsequent task execution.

    (This approach does not affect your previous development experience. All you need to do is provide an existing interactive cluster when registering the Linked Service—this is only to ensure the Linked Service can be created successfully. It will not incur any additional cost or have any further impact.)

    0 comments No comments

  4. Krzysztof Przysowa 20 Reputation points
    2025-07-03T09:06:41.83+00:00

    Hi @Yunpeng Tang ,
    Many thanks, as I understood correctly, in summary the solve to my problem is to use new Job activity, which is currently in preview mode.
    The job activity really runs the databricks job / workflow in the synchronous way, so no more of submit / check for results in the loop logic.

    So, the provided method is a workaround to be able to use Databricks orchestration from ADF, my question was really about being able to use Databricks serverless compute directly without extra layer of orchestration.

    Are you aware if there any plans to introduce true support for serverless clusters / libraries for them in ADF?
    Maybe also the SQL script execution on the sql warehouse?
    The real breakthrough where comes to integration of the ADF with databricks would be the ability to reuse the job clusters in subsequent activity without necessity to wait for creation for the new one.

    Let me give it a try and get back to you.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.