Share via

Synapse Spark consumption

Ryan Abbey 1,186 Reputation points
2022-05-05T03:28:50.25+00:00

We have a series of executions to carry out within Spark for which we have created a small spark pool with min/max 3/10 nodes. Observations note that this configuration results in a max of 3 notebooks executing at once (does not appear to be more than 2 but not part of this issue) and while all the notebook processes start up around the same time, all except the first 2/3 are effectively in a "waiting" state

With each of our notebooks taking approx. 2 minutes to run, this gives the tail end notebooks the appearance they have taken 20-30 minutes to run even though actual execution time is under 2 minutes. Looking at the "consumption" box that comes with a spark execution, those tail end notebooks have a very high "External activities" value

199062-image.png

So what I'm trying to understand, even though those notebooks are not actively running, are they generating cost? If they are all using the one Spark pool, then should we expect the cost to be purely be the number of active nodes on that pool for the duration they are running rather than a cost associated with the individual consumption values?

We don't want to sequentialise these notebooks and we don't want a spark pool with 50/100 nodes just so we don't have waiting notebooks so seeing these notebooks all with (relative) high consumption values is a little disconcerting

Azure Synapse Analytics
Azure Synapse Analytics

An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.

0 comments No comments

1 answer

Sort by: Most helpful
  1. Vahid Ghafarpour 23,605 Reputation points
    2023-08-03T17:29:25.47+00:00

    The primary cost consideration for your Spark pool is the number of active nodes and the time they spend processing data. Notebooks in a "waiting" or "queued" state should not contribute significantly to the cost during that time. However, monitor your Spark pool's overall cost to ensure that it aligns with your budget and expected usage patterns.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.