Spark Cluster configurations and billing on Data Factory/Synapse

Jona 475 Reputation points
2023-09-07T19:47:21.6566667+00:00

Hi,

I'm a little confused. I'm budgeting a solution that uses Azure Data Factory and Synapse, so I use the Azure Price Calculator. These are my questions:

¿The cluster used in debug session is different from the specified in the integration runtime? I ask this because I see two configuration clusters, that is:

ir-setup

dataflow activity

I get confused with these two configurations.

¿How is it billed?

When using Azure Calculator, I just got this:

calculator

The first time I said: "Ok, here I can price the spark cluster". but when I consult here, Azure Calculator expresess in terms of vCores and the docs say "nodes"

¿would you help on this, to understand the diferent cluster configuractions (debug, session) and how to locate them in the Azure Calculator?

Best regards

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,917 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,681 questions
{count} votes

Accepted answer
  1. Bhargava-MSFT 31,011 Reputation points Microsoft Employee
    2023-09-07T22:48:03.4266667+00:00

    Hello Jona,

    Welcome to the Microsoft Q&A forum.

    For your first question:
    Yes, The cluster(IR) used in the debug session is different from the one specified in the dataflow integration runtime

    (Or) You can use the same IR on the "dataflow debug" session.

    When you enable "dataflow debug", You will be charged on that particular IR for the duration.

    Ex:

    When you turn on "Data flow Debug" on the integration runtime drop-down, you can choose one of the existing IRs you already created.

    If you choose the debug time to live = 4 hours for a small cluster, you will be charged for the 4 hours on that IR.

    So your charge on the "dataflow debug" is as follows:

    small cluster = 8 Vcores

    total time = 4 hrs

    1 debug session = 1 instance

    total cost = 1 instance * 4 hours

    Considering your cluster is using basic V-cores, then the cost would be = $8.74 (for the dataflow debug session)

    My understanding is Basic Vcores are used on the auto IR. (but this is not documented). The billing team can confirm this if you need more details on it.

    User's image

    User's image

    For your question on the document:

    "Nodes" in the document is nothing but the cluster size( small, medium, large etc)

    Small node = 4 driver nodes + 4 worker nodes = 8 Vcores

    Coming to IR on the dataflow:

    You can also choose the existing IR from the drop-down or create a new IR here. IR is nothing but compute to run the activities.

    The price varies depending on the IR size- small, medium, large etc. Small nodes have fewer Vcores compared to medium or large nodes.

    The same above formula applies to the Dataflow IR as well.

    Ex: If your data flow ran for 4 hours with a small cluster, the price would be the same as above.

    So, the total price for the dataflow IR = cost of the IR on the "dataflow debug" + cost of the IR on the actual dataflow to run the activities.

    Here, I am considering a different IRs on the "dataflow bebug" and the actual dataflow.

    If you use the same cluster, the compute will be shared between the dataflow debug and the dataflow to run the activities.

    Please note: The minimum vcores for IR is 8

    Data Flows

    Data Flows are visually-designed components that enable data transformations at scale. You pay for the Data Flow cluster execution and debugging time per vCore-hour. The minimum cluster size to run a Data Flow is 8 vCores. Execution and debugging charges are prorated by the minute and rounded up.

    Reference document:
    https://azure.microsoft.com/en-us/pricing/details/synapse-analytics/

    I hope this helps. Please let me know if you have any further questions.

    If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.