Horizontal scaling (Sharding)

Himadri Nath 20 Reputation points
2023-03-04T18:56:03.5733333+00:00

should I avoid Autoincremented value to generate primary key (that are not shard keys) for future scalability (Sharding). in this article https://cutt.ly/j8df6Ar point 4 saying so.

 

Autoincremented values in other fields that are not shard keys can also cause problems. For example, if you use autoincremented fields to generate unique IDs, then two different items located in different shards might be assigned the same ID.

Azure SQL Database
SQL Server Other
0 comments No comments
{count} votes

Accepted answer
  1. Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
    2023-03-05T14:47:27.08+00:00

    So the Article table has no ShardKey, and that is alright, as that follows from the account. As long as the accounts stays in their shard, it may not matter if there are ten articles that all has ID 23. But then you decide to move one account from one shard to another...

    If you want to avoid the GUIDs, an alternative is to make sure that each shard has its own range of ids. Rather than using IDENTITY, I think you should use sequences instead.

    CREATE TABLE dbo.Article (
      ArticleId bigint NOT NULL CONSTRAINT default_Article_ArticleID DEFAULT NEXT VALUE FOR ArticleIdSequence,
      UserID uniqueidentifier NOT NULL,
      Body nvarchar(660) NULL,
      CONSTRAINT PK_Post PRIMARY KEY CLUSTERED (ArticleId)
    )
    
    

    Then you can define the sequence as

    CREATE SEQUENCE ArticleIdSequence AS bigint
       START WITH 1
       INCREMENT BY 50
    

    On the first shard. On the next shard, you start at 2 etc. You can also have different arrangements, but the point is that each shard has its set of numbers, so there cannot be any collisions in ids. Or, well, with the scheme above there is room for 50 shards. When you add the fifty-first shard you will need to rearrange.

    And that brings us to why sequences are a better choice than IDENTITY. With IDENTITY you cannot change the key values, nor can you easily explicitly insert id values. Since the sequence is its own object and not a property of the column, there are no such restrictions with sequences.


2 additional answers

Sort by: Most helpful
  1. Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
    2023-03-04T19:30:17.09+00:00

    Since we don''t know why you are considering auto-incrementing columns that are not shard keys, we can not say. If you plan to use the id only inside a single shard. just go for it. But you have been warned.


  2. Alberto Morillo 34,671 Reputation points MVP Volunteer Moderator
    2023-03-05T03:49:18.08+00:00

    You can use Globally Unique Identifiers (GUIDs) instead to completely avoid the issues outlined on the documentation. This way items will have unique Ids across all shards.

    Although there are some disadvantages you need to be aware about using GUIDs: #1 is they are bigger in size (16 bytes) compared to int (4 bytes) and have some selectivity problems. GUIDs may also be inefficient in join operations between tables as they store bigger values. Finally, they can create more fragmentation. Make some tests with using GUIDs as primary keys as an alternative to autoincremented values.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.