Share via

Data Processed when using aggregate query

Serverless SQL 216 Reputation points MVP
2021-06-05T18:21:50.833+00:00

Hi, I have ~3GB of CSV data in Azure Storage and am using Serverless SQL Pools to query this data using a View. If I do a query such as this I get a data processed total of around 3GB (which is to be expected as it's scanning all the data)

SELECT COUNT()
FROM
OPENROWSET
(
BULK 'sourcedatadesktopdataflow/salesorderline/OrderYear=
/OrderMonth=/OrderDatePartition=/*.csv',
DATA_SOURCE = 'ExternalDataSourceDataLake',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
HEADER_ROW = TRUE,
FIELDTERMINATOR =','
)
WITH
(
OrderLineID INT,
OrderDate DATE
)
AS fct

However, if I group by a column in the CSV data, the data processed is doubled to 6GB and I can't work out why this would be the case. Does anyone have any suggestions?

SELECT OrderDate,COUNT()
FROM
OPENROWSET
(
BULK 'sourcedatadesktopdataflow/salesorderline/OrderYear=
/OrderMonth=/OrderDatePartition=/*.csv',
DATA_SOURCE = 'ExternalDataSourceDataLake',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
HEADER_ROW = TRUE,
FIELDTERMINATOR =','
)
WITH
(
OrderLineID INT,
OrderDate DATE
)
AS fct
GROUP BY OrderDate

Azure Synapse Analytics
Azure Synapse Analytics

An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.


2 answers

Sort by: Most helpful
  1. Serverless SQL 216 Reputation points MVP
    2021-06-24T15:17:42.313+00:00

    Hi, this is serverless sql pools, is there an option to show the query plan?

    Was this answer helpful?

    0 comments No comments

  2. HimanshuSinha 19,637 Reputation points Microsoft Employee Moderator
    2021-06-08T20:28:28.65+00:00

    Hello @Serverless SQL ,
    Thanks for the ask and using the Microsoft Q&A platform .

    Please check the query plan in both cases and see what the difference . It may be sorting/spooling/hashing + spilling .

    Please do let me know how it goes .
    Thanks
    Himanshu
    Please do consider clicking on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.