Synapse Analytics - Spark Session takes a lot of time to start when you run multiple notebooks at the same time.

Pablo Chinchilla Valverde 36 Reputation points
2023-02-23T23:10:53.7066667+00:00

I have this situation... When I try to execute a batch of notebooks almost at the same time, I've seen the same behavior... The spark session takes more than the usual to start. This is using pipelines... As I show on below image... For example, the one that was executed at 10:50:07 AM, it is a notebook that normally takes 1-2 minutes to process, but it takes x10 times more, just of spark session being started (it takes 10 minutes to start)

User's image

And it is not related to spark pool capacity, because we already check and we are not out of capacity running those notebooks, we already adjust the spark pool to handle these scenarios. Even with smallest batches it behaves the same, if we executed 3 notebooks at the same time, 1 will run as expected and the other 2 will start the spark session after it finished that first notebook, it is like it's waiting for that to complete, and there is no reason to do that, the spark pool has the capacity and the autoscale option to handle that.

User's image

So, I have two questions...

  1. Is there a way to resolve this? I mean, spark pool should be capable of running several notebooks at the same time if for example, has a capacity of 100 nodes, and we only use 10 for notebook on a batch of 3 notebooks, but it seems something is not behaving correctly, since spark session takes a lot of time on those scenarios (we already tested out running individually not at the same time, and they ran in expected time)
  2. I couldn't find any information about it, but does synapse analytics charge for that time where the spark session it's being started? Or it does charge after it started?
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,381 questions
{count} vote

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.