Spark pools: dynamicExecutorAllocation parameter

Question

Spark pools: dynamicExecutorAllocation parameter

Irene 71

Hello,

I'm experimenting with Synapse Spark pools and I've noticed that there's a dynamicExecutorAllocation parameter that is available when creating a Spark pool via REST API but not available via UI

When I'm setting dynamicExecutorAllocation.enabled to true Spark behaviour seems to be slightly inconsistent. If I'm checking spark configuration on job startup sometimes I see that spark.dynamicAllocation.enabled set to true, and spark.dynamicAllocation.maxExecutor is unset, which I guess is expected behaviour, but there is also this parameter present spark.dynamicAllocation.disableIfMinMaxNotSpecified.enabled=true. The job still executes on the min amount of executors, and I am not sure if that's because it's optimal configuration or is it because of spark.dynamicAllocation.disableIfMinMaxNotSpecified.enabled. Is there any information about this?

Also I tried doing the same on a different pool, and when I'm setting dynamicExecutionAllocation.enabled=true there, on the job execution spark configuration still has "spark.dynamicAllocation.enabled=false". From the Spark pool JSON settings:

"dynamicExecutorAllocation": {
            "enabled": true
        },

Are there any additional settings? Or am I using dynamicExecutionAllocation property incorrectly?

Thanks!

Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2021-02-03T00:04:41.597+00:00

@Irene Thanks for using Microsoft Q&A !!

I am able to reproduce it and even though I created a Spark pool using REST with dynamicExecutorAllocation enabled as true , the spark.dynamicallocation.enabled property appears as false during spark pool run.
REST Configuration:

Spark UI Environment tab

I am checking internally with the products team and get back to you on the same.
Irene 71 Reputation points

2021-02-05T09:23:44.327+00:00

Thank you!

3 answers

Your answer

Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2021-02-03T00:04:41.597+00:00

@Irene Thanks for using Microsoft Q&A !!

I am able to reproduce it and even though I created a Spark pool using REST with dynamicExecutorAllocation enabled as true , the spark.dynamicallocation.enabled property appears as false during spark pool run.
REST Configuration:

Spark UI Environment tab

I am checking internally with the products team and get back to you on the same.
Irene 71 Reputation points

2021-02-05T09:23:44.327+00:00

Thank you!

Answer 1

Saurabh Sharma 23,846 Microsoft Employee Moderator

@Irene Sorry for the delay. I am still following up internally with the products team on this behavior however can you please set spark.dynamicExecutionAllocation.enabled on session start payload even dynamic execution allocation is enabled on pool level? You can set it in notebook via magic command (doc). If you are using other http client, the payload should follow the protocol: https://github.com/cloudera/livy#request-body. Please let me know how it goes.

Irene 71 Reputation points

2021-02-18T16:12:04.113+00:00

Hello! Sorry for the delay. I'm running an Apache spark job definition on Scala, I set spark.dynamicAllocation.enabled inside the application, in sparkConf, and it seems to work. But only when I also set min and max amount of executors there as well. Which feels a bit like a workaround
Thanks,
Iryna
Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2021-02-18T21:23:22.747+00:00

@Irene Good to hear that you are unblocked with this workaround. Please let me know if you have any other questions.

Answer 2

Hello,
I noticed the same issue and tried the workaround by overriding the configuration via the first cell in the notebook:

%%configure -f
{
  "conf": {
    "spark.dynamicAllocation.disableIfMinMaxNotSpecified.enabled": true,
    "spark.dynamicAllocation.enabled": true,
    "spark.dynamicAllocation.minExecutors": 2,
    "spark.dynamicAllocation.maxExecutors": 5
  }
}

This works for executing the notebook within a Synapse Studio Develop tab.

However, I noticed when executing the notebook from a synapse pipeline, the configuration change seem to have no effect. SparkUI shows spark.dynamicAllocation.enabled=false in Environment tab.

Answer 3

Sumit Kumar 96

@Saurabh Sharma I came upon this question while trying to solve the same problem - I'm unable to dynamically increase the number of executors from 2. Tried @Martin B 's suggestion. It worked initially but now its throwing an error when executing the notebook.
Is there a final answer on how these parameters can be set for a pool?

Share via

Spark pools: dynamicExecutorAllocation parameter

3 answers

Your answer