Spark pool - multiple notebooks running parallel

Question

Spark pool - multiple notebooks running parallel

Ryan Abbey 1,186

We have an environment where we want to execute around 60 independent processes that will load up data via notebooks (they largely use the same notebook but working on different data). In order to parallelize as much as possible, we have created a spark pool with 130 nodes... none of the data sets are large so would not (and have never noticed anything to the contrary) expect any one process to use any more than the 3 nodes (1 driver, 2 workers) so for a 130 node pool, we should be able to run 40 processes concurrently
Control of the spark process execution is done via data factory which determines what needs to run and kicks off the notebooks

What I find...

Only around 20 of the processes will kick off with additional processes starting once others have finished
If we only run one process, they take around 3-5 minutes to process but when executing multiple, the processing time extends so those 20 processes will take 15+ minutes each, now if these are meant to be independent nodes, why do they appear to be having an impact on each other?

How do we get better performance so that we can run 40 processes concurrently without them impacting each other?Creating 40 spark pools seems a ridiculous resolution, are there other settings we can adjust to get better performance?

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2022-09-01T22:35:49.573+00:00

Hello @Ryan Abbey ,

Thanks for the question and using MS Q&A platform.

Agree with you that creating 40 spark pools is not an ideal solution. Let me double check with respective product regarding this behavior and will get back to you as soon as I have an update from the team.
If this is a blocker, I would also recommend filing a support ticket for a deeper analysis on your spark pools.
Ryan Abbey 1,186 Reputation points

2022-09-05T20:41:10.437+00:00

Thanks

The 20 process limit seems to be a Data Factory limitation somewhere, I tried creating a second spark pool and splitting the work between them but it still only kicked off 20... also still slow to start up... makes me wonder if the slowness is down to number of nodes even if they aren't all being used - when debugging we use a small node and job finishes in 3-5 minutes (including start up time)

1 answer

Your answer

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2022-09-01T22:35:49.573+00:00

Hello @Ryan Abbey ,

Thanks for the question and using MS Q&A platform.

Agree with you that creating 40 spark pools is not an ideal solution. Let me double check with respective product regarding this behavior and will get back to you as soon as I have an update from the team.
If this is a blocker, I would also recommend filing a support ticket for a deeper analysis on your spark pools.
Ryan Abbey 1,186 Reputation points

2022-09-05T20:41:10.437+00:00

Thanks

The 20 process limit seems to be a Data Factory limitation somewhere, I tried creating a second spark pool and splitting the work between them but it still only kicked off 20... also still slow to start up... makes me wonder if the slowness is down to number of nodes even if they aren't all being used - when debugging we use a small node and job finishes in 3-5 minutes (including start up time)

Answer 1

M Saad 36

Hello,
There is a 50 vCore limit for the workspace maybe that is causing your issue. when you run all of your note books in parallel 50 v cores are consumed by 20 notebooks, so the rest of the notebooks are queued. they run when the running notebooks are finished and v cores are freed .

M Saad 36 Reputation points

2022-09-13T07:14:19.837+00:00

If this is the issue you are facing then you can create a support ticket to increase vCore limit for your workspace.

https://stackoverflow.com/a/70304803/16508212

Create Support ticket: https://learn.microsoft.com/en-us/answers/questions/612897/exceeding-vcore-quota.html

Share via

Spark pool - multiple notebooks running parallel

1 answer

Your answer