How Synapse charge notebook

Jia Zhang 60 Reputation points
2024-10-24T15:59:59.37+00:00

Hi Team,

May I check how Synapse charge notebook?

If we are using Synapse to process streaming data from Eventhub, and we need 2 process to handle data, may I know for below option if cost will be difference:

  1. Using one pipeline running 1 notebook, in this notebook, there are 2 readstream, writestream functions running
  2. Using 1 pipeline, but contains 2 notebook, both notebook running with same spark poolUser's image
  3. building 2 pipelines, each pipeline contain 1 notebook. But still both notebook using same spark pool.

Will these 3 options generate same cost since they are all only using 1 spark pool, or Synapse is not charge by spark pool itself, but by notebook, which is otpion 2 or 3 will be similair, but will be almost double of option 1, or it will be other case? Thanks! Just wondering how Synapse charge the spark notebook.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,997 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ganesh Gurram 1,110 Reputation points Microsoft Vendor
    2024-10-25T03:51:04.0733333+00:00

    @Jia Zhang - Thanks for the question and using MS Q&A forum.

    According to this documentation: Plan to manage costs for Azure Synapse Analytics - Azure Synapse Analytics | Microsoft Learn  Azure Synapse Analytics charges for Spark resources based on the number of Spark pools and the amount of time those resources are utilized, rather than the number of notebooks or pipelines you run. 

    User's image

    Let's analyze your three scenarios: 

    • Option 1: One Pipeline with One Notebook (Two ReadStream and WriteStream Functions): This option will incur costs based on the Spark pool's active time while the notebook is running. Since you are using a single notebook, the cost will be limited to the duration of that notebook's execution. 
    • Option 2: One Pipeline with Two Notebooks (Both Using the Same Spark Pool): In this case, if both notebooks are executed sequentially, the cost will be like Option 1, as you are still using the same Spark pool. However, if they run concurrently, the cost will be based on the time both notebooks are utilizing the Spark pool simultaneously. 
    • Option 3: Two Pipelines, Each with One Notebook (Both Using the Same Spark Pool): If both pipelines run concurrently, you will incur costs for the Spark pool for the duration both notebooks are running. This could lead to higher costs compared to Option 1, as you are effectively utilizing the Spark pool for a longer period. 

    In summary, the cost will not be significantly different between Options 1 and 2 if they run sequentially. However, if you run them concurrently, Options 2 and 3 could lead to higher costs due to the increased utilization of the Spark pool. Always monitor your Spark pool usage and consider scaling it appropriately based on your workload to optimize costs. 

    Use the Azure pricing calculator to estimate costs before you add Azure Synapse Analytics. 

    Refer to the similar thread: Azure Synapse Spark pool pricing - Microsoft Q&A 

    For more information refer to: Plan to manage costs for Azure Synapse Analytics - Azure Synapse Analytics | Microsoft Learn 

    Hope this helps. Do let us know if you have any further queries.  

    ------------  

    If this answers your query, do click `Accept Answer` and `Yes` for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.