how azure synapse on demand calculate data processed ?

mim 26 Reputation points
2020-07-26T03:53:31.203+00:00

for testing purpose I am using a small parquet file, it is only 1.2 MB, but I notice when I check the data processed metrics, I see numbers like 12 mb.

my question is does On demand mode for Azure Synapse charge for the compressed data or uncompressed

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,997 questions
{count} votes

2 answers

Sort by: Most helpful
  1. HarithaMaddi-MSFT 10,136 Reputation points
    2020-08-10T11:33:08.577+00:00

    Hi @mim ,

    Thanks for your patience. Product team confirmed that Synapse SQL serverless billing is based on data processed and Data processed is amount of data stored internally while executing your query. It consists of data read (compressed data+metadata reads) and intermediate results (data shuffled which is in uncompressed format always). In case of your query, it read all columns and all rows, which means that data processed = compressed (data that is read + metadata reads) + uncompressed (data that is shuffled to your endpoint) and few more like autostats and read-ahead. If we are running aggregated queries, data processed is equivalent to compressed file size because on top of it there would be metadata reads and shuffling of result of SUM function (single value) which would add insignificant overhead comparing to actual data read.

    Product team is working on updating pricing page with better explanation and samples. Hope this helps!

    1 person found this answer helpful.
    0 comments No comments

  2. HarithaMaddi-MSFT 10,136 Reputation points
    2020-07-29T11:11:47.023+00:00

    Hi @mim ,

    Thanks for your valuable insights. I reproduced and observed the discrepancy in "Data Processed" metrics in Azure Portal compared to the file size in the Synapse query as per below snaps. I have shared the details with Product team, they are working on the issue to understand the root cause of the amplification in metric and its impact to billing. However, Product team confirmed that charge will happen only on compressed data for parquet file and they will work on fixing it if it is not that way in current product after further investigation. I will closely work with the product team and will get back to you once I hear more updates.

    Stay tuned!

    14178-blobparquetfiles.png

    88792776-6b71da80-d1b9-11ea-9ca3-29267bbda094.png

    88792961-b12ea300-d1b9-11ea-8097-5c25aacb8d37.png

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.