how to increase the batch count more than 50 in for each activity in adf ?

Roopesh Chelikani 6 Reputation points
2021-05-10T12:26:47.687+00:00

I was running a for each activity to run 5000 data bricks notebooks in parallel and the cluster is getting overloaded and getting an error Driver node not available. So I change the batch count to 50. now the cluster is fine but it's taking too long I want to increase the batch count what should i do ?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,609 questions
{count} vote

2 answers

Sort by: Most helpful
  1. Vaibhav Chaudhari 38,916 Reputation points Volunteer Moderator
    2021-05-10T14:36:48.617+00:00

    50 is the max batch for For each activity and we can't cross that.

    Maybe have one or more For each activity that will also run some notebooks in parallel. You will have to define some logic in such way that some 2500 notebooks are run by one for each and rest 2500 by another one.


    Please don't forget to Accept Answer and Up-vote if the response helped -- Vaibhav

    0 comments No comments

  2. HimanshuSinha-msft 19,486 Reputation points Microsoft Employee Moderator
    2021-05-10T23:24:28.593+00:00

    Hello @Roopesh Chelikani
    One other way may be to get around this limitation can be to pass a collection to databricks notebook and individual values . For this you may have to update the logic on the notebook .

    Also if you have an estimate as to how much time it takes to process some X items out of 5000 , you can simply put an IF activity and add an wait activity inside the the IF activity .

    Thanks
    Himanshu
    Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.