Best index option for tables with large data volumes containing column with nvarchar(max) datatype in Azure Synapse Analytics

Vyshnavi gv 21 Reputation points
2022-01-20T12:24:59.187+00:00

I want to create a daily transactional table in Azure Synapse Analytics dedicated sql pool. Data volume will be around 10 million records per month.
I have few columns like description which contains unicode characters and are of high length. So, I had to use nvarchar(max) datatype for these columns.

Since Synapse doesn't allow us to create clustered columnstore index on tables that contain max datatypes, I am not sure what other index to use.
I know heap is one option, but my table volume is definitely going to be more than 60 million rows, so I believe heap is not best to go with.
Also I read in documentation that clustered index gives best performance only we have highly selective queries on the column with clustered index.

So, not sure how to go forward with it.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,473 questions
0 comments No comments
{count} votes

Accepted answer
  1. ShaikMaheer-MSFT 38,201 Reputation points Microsoft Employee
    2022-01-21T14:37:08.347+00:00

    Hi @Vyshnavi gv ,

    Thank you for posting query in Microsoft Q&A Platform.

    Usually Clustered Column store indexes are best for overall query performance. But as you mentioned these indexes will not support varchar(max).

    If you can avoid having vcarchar(max), nvarchar(max) and varbinary(max) type columns in your table, then I will suggest to go with Clustered Column Store index.

    If you cannot avoid above column types on your table then consider heap or clustered index instead.

    If your table is small table with less than 60 million rows then considering heap is good option.

    Good thing about Clustered indexes is that, they may outperform clustered columnstore tables when a single row needs to be quickly retrieved. For queries where a single or very few row lookup is required to perform with extreme speed, consider a clustered index or nonclustered secondary index.

    The disadvantage to using a clustered index is that only queries that benefit are the ones that use a highly selective filter on the clustered index column. To improve filter on other columns, a nonclustered index can be added to other columns. However, each index that is added to a table adds both space and processing time to loads.

    So, I feel clustered index will be better choice in your case as data is going to be more than 60 millions and your table is going to contain varchar(max) columns too. And if need to improve performance on other columns too then consider having nonclustered index on other columns.

    Click here to know more about table indexes in Synapse.

    Hope this will help. Please let us know if any further queries.

    ----------------

    Please consider hitting Accept Answer button. Accepted answers helps community as well.


0 additional answers

Sort by: Most helpful