Spark pools: dynamicExecutorAllocation parameter

Irene 71 Reputation points
2021-02-02T15:10:41.767+00:00

Hello,

I'm experimenting with Synapse Spark pools and I've noticed that there's a dynamicExecutorAllocation parameter that is available when creating a Spark pool via REST API but not available via UI

When I'm setting dynamicExecutorAllocation.enabled to true Spark behaviour seems to be slightly inconsistent. If I'm checking spark configuration on job startup sometimes I see that spark.dynamicAllocation.enabled set to true, and spark.dynamicAllocation.maxExecutor is unset, which I guess is expected behaviour, but there is also this parameter present spark.dynamicAllocation.disableIfMinMaxNotSpecified.enabled=true. The job still executes on the min amount of executors, and I am not sure if that's because it's optimal configuration or is it because of spark.dynamicAllocation.disableIfMinMaxNotSpecified.enabled. Is there any information about this?

Also I tried doing the same on a different pool, and when I'm setting dynamicExecutionAllocation.enabled=true there, on the job execution spark configuration still has "spark.dynamicAllocation.enabled=false". From the Spark pool JSON settings:

"dynamicExecutorAllocation": {
            "enabled": true
        },

Are there any additional settings? Or am I using dynamicExecutionAllocation property incorrectly?

Thanks!

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,925 questions
{count} vote

3 answers

Sort by: Most helpful
  1. Saurabh Sharma 23,821 Reputation points Microsoft Employee
    2021-02-11T00:24:15.357+00:00

    @Irene Sorry for the delay. I am still following up internally with the products team on this behavior however can you please set spark.dynamicExecutionAllocation.enabled on session start payload even dynamic execution allocation is enabled on pool level? You can set it in notebook via magic command (doc). If you are using other http client, the payload should follow the protocol: https://github.com/cloudera/livy#request-body. Please let me know how it goes.


  2. Martin B 96 Reputation points
    2021-03-26T09:36:09.423+00:00

    Hello,
    I noticed the same issue and tried the workaround by overriding the configuration via the first cell in the notebook:

    %%configure -f
    {
      "conf": {
        "spark.dynamicAllocation.disableIfMinMaxNotSpecified.enabled": true,
        "spark.dynamicAllocation.enabled": true,
        "spark.dynamicAllocation.minExecutors": 2,
        "spark.dynamicAllocation.maxExecutors": 5
      }
    }
    

    This works for executing the notebook within a Synapse Studio Develop tab.

    However, I noticed when executing the notebook from a synapse pipeline, the configuration change seem to have no effect. SparkUI shows spark.dynamicAllocation.enabled=false in Environment tab.

    0 comments No comments

  3. Sumit Kumar 96 Reputation points
    2021-06-16T13:08:33.917+00:00

    @Saurabh Sharma I came upon this question while trying to solve the same problem - I'm unable to dynamically increase the number of executors from 2. Tried @Martin B 's suggestion. It worked initially but now its throwing an error when executing the notebook.
    Is there a final answer on how these parameters can be set for a pool?

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.