query on azure synapse limits

Question

query on azure synapse limits

AzureUser-9588 151

Looking to understand the hard limits around Apache Spark in Azure Synapse Analytics. As per this documentation there seems to be nothing - https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-concepts
Are there any limits on number of jobs in a Spark Pool? Number of parallel jobs that can run in a Spark pool?

0 comments

3 answers

Your answer

Answer 1

Let me see if I can address these:

What will happen if another user U2 submits Job J3, that uses 30 nodes?
As long as J1 is running no other jobs will be processed. J3 will be rejected if it comes from a notebook but queued if it is a batch job until J1 completes. J2 in your example would be the same. However once J1 completes, J2 and J3 can run at the same time since the two jobs total up to <=50 nodes..

Is there any limit on number of sessions that can be created in Spark Pool?
I would need to check with the product team on this question to verify, but more than likely your are going to run out of nodes before you hit maximum concurrent sessions.

Does each session will have its own set of fixed clusters created?
No, if the pool is a fixed cluster of 50 nodes then it will always be 50 nodes regardless of how many jobs are running or how many nodes each job requests. So you could have 1 job using all 50, 2 that use 25 each, 5 that use 10 each etc. Not every job has to be the same size- your example of 20 and 30 at the same time is fine too.

Auto-scaling does this in a way- you could set your pool to run 20-100 nodes and it will spin up as many nodes as needed for active jobs, with a minimum of 20. J1, J2, and J3 could run at the same time, If only J2 was running by itself if would scale down to the 20 nodes needed for that, and can run any combination in between. If your max is 100 and J1, J2, and J3 are all running then you would not be able to start another job until at least one completes.

If all of the jobs you want to run concurrently > 200 nodes, then you will need to create another pool and balance the jobs between them.

sohail sayed 1 Reputation point

2022-04-14T18:18:29.387+00:00

This doesnt fit in with https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-concepts#example-3
if you see as per the msdn article there are 3 concurrent jobs running 2 from U1 using 20 nodes and 1 from U2 using 10 nodes. The pool size is 20 nodes only.
I am also trying to figure out the usage and limits here and the msdn article seems to be very confusing.

Answer 2

AzureUser-9588 151

This query is with reference to this article - https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-concepts#example-3

In case I have a Spark Pool with fixed cluster of 50 nodes. User U1 submitted a job J1, that uses 50 nodes. What will happen if another user U2 submits Job J3, that uses 30 nodes? And user U1 submitted another job J2, that uses 20 nodes?

Is there any limit on number of sessions that can be created in Spark Pool?
Does each session will have its own set of fixed clusters created?

0 comments

Answer 3

Samara Soucy - MSFT 5,141

Each pool has a maximum of 200 nodes which creates a limit on the number of parallel jobs- each session requires 1 + the number of executor nodes selected (minimum 1). So, assuming the maximum pool size and minimum resources per session used that would max the pool out at 100 sessions. You are usually going to use more than one executor per job so in most cases you won't be able to run that many- number of nodes is going to drive how many jobs can run on a pool at a time more than anything else.

0 comments

Share via

query on azure synapse limits

3 answers

Your answer