Azure Synapse : Run more Spark Application on a cluster than number of nodes

Bitchiko Tchelidze 6 Reputation points
2021-12-22T17:55:06.087+00:00

Have the Spark Pool with following config "Medium (8 vCores / 64 GB) - 3 to 3 nodes".

From within a pipeline P1 i'm running multiple concurrent pipelines P2s (using Foreach). P2 pipeline contains single notebook, N1. When running P1, it spawns P2 pipelines, looking at "Pipeline Runs" tab in Synapse studio, i see multiple (say 10) P2s running. Although looking at "Apache Spark Applications" tab, i see that only 3 (at any given time) apache spark application is running (running N1 notebooks), the rest of the applications are in "Queued" state.

Inside that notebook N1, i have custom configuration for spark session

%%configure -f
{
    "driverMemory": "4g",
    "driverCores": 1,
    "executorMemory": "4g",
    "executorCores": 1,
    "numExecutors" : 1
}

My reasoning is the following, for each spark application, it should take (4 + 4) = 8GB of ram and (1 + 1) 2Core. In total, my spark pool has 3*8=24 Core and 3*64=192GB of memory. That amount of resource should accommodate 12 concurrent spark applications (due to Cores limit, 2 * 12=24, Memory should be more than enough). But i only see 3 concurrent applications.

Looking at the SparkUI i see that spark application indeed has the resources i configured it with, so configuration is taking effect.

Question : Why is there only 3 spark apps running ? and can i ran more than 3 spark app on my cluster ?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,346 questions
.NET Machine learning
.NET Machine learning
.NET: Microsoft Technologies based on the .NET software framework.Machine learning: A type of artificial intelligence focused on enabling computers to use observed data to evolve new behaviors that have not been explicitly programmed.
150 questions
{count} votes

2 answers

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 76,586 Reputation points Microsoft Employee
    2021-12-23T04:29:33.583+00:00

    Hello @BitchikoTchelidze-5514,

    Thanks for the question and using MS Q&A platform.

    Which does not seem to make sense to me, because my app is requesting 2 Cores (1 executor, 1 driver) not 8.

    A Spark pool can be defined with node sizes that range from a Small compute node with 4 vCore and 32 GB of memory up to a XXLarge compute node with 64 vCore and 512 GB of memory per node. Node sizes can be altered after pool creation although the instance may need to be restarted.

    159839-image.png

    You might have created Apache Spark pool requesting (1 executor, 1 driver) of small size which mean (4 vCores + 4 vCores) = 8 vCores.

    Every Azure Synapse workspace comes with a default quota of vCores that can be used for Spark. The quota is split between the user quota and the dataflow quota so that neither usage pattern uses up all the vCores in the workspace. The quota is different depending on the type of your subscription but is symmetrical between user and dataflow. However if you request more vCores than are remaining in the workspace, then you will get the following error:

    To resolve this issue, you need to request a capacity increase via the Azure portal by creating a new support ticket.

    Step1: Create a new support ticket and select issue type as Service and subscription limits (quotas) and quota type as Azure Synapse Analytics.

    108984-image.png

    Step2: In the Details tab, click on Enter details and choose quota type as Apache Spark (vCore) per workspace , select workspace, and request quota as shown below.

    108986-image.png

    Step3: Select support method and create the ticket.

    For more details, refer to Apache Spark in Azure Synapse Analytics Core Concepts.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

  2. Richard Luis Díaz 1 Reputation point
    2022-06-21T20:12:34.84+00:00

    Any update?

    0 comments No comments