Sharding with HASH distribution using varchar field

JJ Kilduff 46 Reputation points

Hi there,

I have some documentation about which sharding pattern to use to distribute the data when creating a table:

This says specifically Do not distribute on varchar format.

For example - say I have a table with a [StoreId] field which would be an obvious key to use in a Hash except it is a varchar format. See rough example below:

create table [dbo].[InventoryData](
various fields
[StoreId] [varchar] (20)
more fields

clustered columnstore index,
distribution = HASH (StoreId)

This question appeared on a sample exam for Synapse. If StoreId had been a numeric or int type I would have picked for the hash but I didn't because it was a varchar. The question was marked wrong. So I'd like to know why you aren't supposed to use varchars as the answer to the question seems to contradict the advice.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
2,842 questions
No comments
{count} votes

Accepted answer
  1. David Browne - msft 3,756 Reputation points

    That guidance does not appear in the official documentation. Guidance for designing distributed tables using dedicated SQL pool in Azure Synapse Analytics. If the column is otherwise a good candidate (and StoreId is often good in retail, so long as there are lots of stores and they are roughly the same size), you shouldn't let the data type stop you.

1 additional answer

Sort by: Most helpful
  1. JJ Kilduff 46 Reputation points

    The link I provided was from Microsoft which was why I questioned it. But I think your answer is probably correct. I have posted a screenshot of the section that said this.