Share via

Node allocation spark pool

Ravi Kumar 80 Reputation points
2023-12-07T19:43:50.4566667+00:00

Hello,

Can you explain how the nodes of spark allocated in spark. If the spark pool has 6 nodes that has 2 executors with differnt notebooks. How the allocation works in this case.

Please also explain me about the executors availability.

Azure Synapse Analytics
Azure Synapse Analytics

An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.

0 comments No comments

Answer accepted by question author

Bhargava-MSFT 31,361 Reputation points Microsoft Employee Moderator
2023-12-07T20:45:32.0433333+00:00

Hello Ravi Kumar,

Welcome to the Microsoft Q&A forum.

When attaching notebooks to a Spark pool we have control over how many executors and Executor sizes, we want to allocate to a notebook. And spark instances are based on node availability.

If we choose a node size small(4 Vcore/28 GB) and a number of nodes 5, then the total number of Vcores = 4*5 = 20 vcores

and the Max executors you can select is 4 in this case(when selecting dynamically allocate executors).

Running 2 different notebooks with 2 executors each:

In this case: executor size = small (4 vores, 28 GB)

First note book will use= 4*2 + 1 driver(4 vcores) = 12 vcores will be used

Out of 20 Vcores, 12 were used on the 1st notebook, and you have left with 8 Vcores.

So 3 nodes were used by the first notebook ( 5*12 /20 = 3)

You have only 2 nodes left for the 2nd notebook.

You have submitted 2nd notebook now, and this is also looking for the same resources as notebook 1, since there is no capacity in the spark pool, if this request comes as a notebook, it will be rejected. Or if this comes as a batch job then it will be queued.

In case, if your notebook 2 has 3 nodes available, then notebook 2 still has capacity in the pool and processed by the same spark instance ( spark instance which processed the notebook 1).

To answer your question: if there is a capacity available for the 2nd notebook, then the same spark instance will be used.

Please follow the below document clearly explaining how spark instances will be used in the synapse.

https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-concepts

Please note: Driver Size is equal to Executor size.

also, please check the below thread that I have explained the Vcores concept:
https://learn.microsoft.com/en-us/answers/questions/1011305/parallel-synapse-spark-application-run?childToView=1020997#comment-1020997

I hope this clarifies your question.

In case, if you have any further questions, please let me know.

Was this answer helpful?

1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.