Share via

query on azure synapse limits

AzureUser-9588 151 Reputation points
2021-05-12T11:06:29.717+00:00

Looking to understand the hard limits around Apache Spark in Azure Synapse Analytics. As per this documentation there seems to be nothing - https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-concepts
Are there any limits on number of jobs in a Spark Pool? Number of parallel jobs that can run in a Spark pool?

Azure Synapse Analytics
Azure Synapse Analytics

An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.

0 comments No comments

3 answers

Sort by: Most helpful
  1. Samara Soucy - MSFT 5,141 Reputation points
    2021-05-13T21:54:03+00:00

    Let me see if I can address these:

    What will happen if another user U2 submits Job J3, that uses 30 nodes?
    As long as J1 is running no other jobs will be processed. J3 will be rejected if it comes from a notebook but queued if it is a batch job until J1 completes. J2 in your example would be the same. However once J1 completes, J2 and J3 can run at the same time since the two jobs total up to <=50 nodes..

    Is there any limit on number of sessions that can be created in Spark Pool?
    I would need to check with the product team on this question to verify, but more than likely your are going to run out of nodes before you hit maximum concurrent sessions.

    Does each session will have its own set of fixed clusters created?
    No, if the pool is a fixed cluster of 50 nodes then it will always be 50 nodes regardless of how many jobs are running or how many nodes each job requests. So you could have 1 job using all 50, 2 that use 25 each, 5 that use 10 each etc. Not every job has to be the same size- your example of 20 and 30 at the same time is fine too.

    Auto-scaling does this in a way- you could set your pool to run 20-100 nodes and it will spin up as many nodes as needed for active jobs, with a minimum of 20. J1, J2, and J3 could run at the same time, If only J2 was running by itself if would scale down to the 20 nodes needed for that, and can run any combination in between. If your max is 100 and J1, J2, and J3 are all running then you would not be able to start another job until at least one completes.

    If all of the jobs you want to run concurrently > 200 nodes, then you will need to create another pool and balance the jobs between them.

    Was this answer helpful?


  2. AzureUser-9588 151 Reputation points
    2021-05-13T07:55:54.853+00:00

    This query is with reference to this article - https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-concepts#example-3

    In case I have a Spark Pool with fixed cluster of 50 nodes. User U1 submitted a job J1, that uses 50 nodes. What will happen if another user U2 submits Job J3, that uses 30 nodes? And user U1 submitted another job J2, that uses 20 nodes?

    Is there any limit on number of sessions that can be created in Spark Pool?
    Does each session will have its own set of fixed clusters created?

    Was this answer helpful?

    0 comments No comments

  3. Samara Soucy - MSFT 5,141 Reputation points
    2021-05-13T02:21:18.53+00:00

    Each pool has a maximum of 200 nodes which creates a limit on the number of parallel jobs- each session requires 1 + the number of executor nodes selected (minimum 1). So, assuming the maximum pool size and minimum resources per session used that would max the pool out at 100 sessions. You are usually going to use more than one executor per job so in most cases you won't be able to run that many- number of nodes is going to drive how many jobs can run on a pool at a time more than anything else.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.