Azure data factory Databricks for each loop

BHAWNA BEDI 21 Reputation points
2022-02-18T07:10:20.257+00:00

Hi team,

I have a notebooks that loads multiple tables in database.
If this notebook runs inside for each activity with job cluster, it spins up one cluster for each table.

Can we have a workaround where only one job cluster is spinned up?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
0 comments No comments
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 26,236 Reputation points
    2022-02-18T17:34:24.77+00:00

    Hello @BHAWNA BEDI and welcome to Microsoft Q&A.
    It sounds like you want to optimize your Databricks usage by Data Factory. Specifically, minimize cluster creation and maximize re-use.

    In the Databricks Linked Service, there are several options when selecting what cluster to use:

    New Job Cluster: Make a new cluster every times
    Existing Interactive Cluster: Specificy a currently running cluster to re-use.
    Existing Instance pool: Draw resources from a pool kept warm in the Databrick service

    175953-image.png

    Of these three, you do not want to use the New Job Cluster, as that is the behavior you are describing; creation of a new cluster for every activity. Either of the other two would do what you want. For production, I'd recomend the Existing Instance pool option. This is because in Databricks you can set your pool to scale up/down as needed, and always have a minimum number of nodes ready.

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Vaibhav Chaudhari 38,916 Reputation points Volunteer Moderator
    2022-02-18T07:25:22.853+00:00

    Maybe, create a parent notebook and call all child notebooks in it so that you will avoid for loop in ADF and use only single databricks notebook activity?

    2 people found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.