Very poor performance selecting from table with 6 Billion rows

o19210 1 Reputation point
2022-04-25T22:12:12.997+00:00

Hi everyone,

I am currently having very poor performance while selecting from table with 6 billion rows in order to return around 30 million rows.
I could not find a right key to hash the table without it being skewed so left it with round robin and with a clustered columnstore index.
My Synapse is using DWU3000.

The query I am performing is a select from the table where the date is equal to a particular day/snapshot.
On the execution plan I see that it takes more than 1 hour to perform a PartitionMoveOperation. Don't even understand why since my table is not partitioned.

Can someone assist on best way to optimize this table?

Currently I am even trying CTAS for testing other scenarios and the query is running for more than 1 hour and nothing yet.

Thank you

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,252 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. AaronHughes 391 Reputation points
    2022-04-26T10:58:09.57+00:00

    it sounds like you have a badly designed table to be honest -

    I would have a look at redesign on the table - https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute

    Moving to a Hash and add in partition will work to your advantage on a query that is base on a date (day/snapshot) - something as simple as partitioning by the year would reduce the section your scanning and restricting from a lot