Notebook clusters in pipeline parallel processes

Iwan 40 Reputation points
2024-09-19T12:59:09.1533333+00:00

I have a pipeline triggered daily which loops through a list of notebooks, and runs them using a cluster in a for each run in parallel. At the moment I have seven notebooks running and adding more over time.

What would happen when the cluster runs more than it's able to at one time? Will it process x number of notebooks at once and leave the rest in queue until it has capacity or will it fail to run the rest as it's reached its capacity?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,917 questions
0 comments No comments
{count} votes

Accepted answer
  1. Vinodh247 20,476 Reputation points
    2024-09-19T14:38:07.2766667+00:00

    Hi Iwan,

    Thanks for reaching out to Microsoft Q&A.

    In Azure Synapse, when you run multiple notebooks in parallel within a pipeline, the capacity of the cluster can become a bottleneck. If the cluster reaches its capacity (ex: due to limited available cores or memory), synapse will typically queue the remaining jobs until resources become available. It will not fail the jobs outright unless there's an issue such as an out-of-memory error or a configuration problem.

    To handle this gracefully, you can try the following:

    • Ensure you monitor the cluster’s CPU and memory usage to understand when you might hit capacity limits.
    • Adjust the concurrency in your "ForEach" activity in the pipeline to limit how many notebooks are run in parallel. This can be done using the "Batch Count" property.
    • If you are using a cluster that supports auto-scaling, the cluster will attempt to add more nodes to accommodate additional workload, as long as it is configured and has not hit any maximum limits.
    • Synapse typically queues notebook executions that exceed current resources. If you have many notebooks, it will process them as resources free up. If a notebook fails due to resource constraints, you could add retry policies in your pipeline.
    • For spark jobs, If you consistently have long queue times, consider increasing your spark pool size.

    Implementing these strategies can help manage resource allocation and prevent failures due to exceeding cluster capacity.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.