Duplicate inserted in synapse table Id column ( Identity 1,1 )

Srinivas K 11 Reputation points
2025-03-22T04:19:33.9266667+00:00

Hi All ,

I see Duplicate inserted in synapse table Id column even given that column given as Identity(1,1).

Is there any reson for this issue? generally it should not insert duplicates. Is there any root cause and solution for this is gr8 helpfull to us

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,380 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Erland Sommarskog 122K Reputation points MVP Volunteer Moderator
    2025-03-22T11:08:17.9366667+00:00

    First of all, the IDENTITY property on its own is never a guarantee for unique values, not even in regular SQL Server. However, in regular SQL Server you would only get duplicates if you reseed the identity value or similar.

    In Azure Synapse Analytics, it's even more intricate since there are parallel independent nodes. Each node has its set of values, but if you do manual updates things can go wrong. See further https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-identity


  2. phemanth 15,765 Reputation points Microsoft External Staff Moderator
    2025-03-26T06:48:17.8633333+00:00

    @Srinivas K

    You're welcome, Given that the issue is intermittent and not due to manual inserts, it could indeed be related to the distribution method used in your Synapse table. Here are a few points to consider:

    • HASH Distribution: This method distributes rows based on the hash value of a specified column. While it helps in evenly distributing data, it can sometimes lead to issues if the hash function doesn't distribute values uniformly.
    • Round Robin Distribution: This method distributes rows evenly across all distributions without considering the values in any particular column. It is simpler but might not be optimal for large tables with frequent joins.

    • Check Distribution Key: Ensure that the distribution key chosen for HASH distribution is appropriate and results in an even distribution of data.
    • Use Unique Constraints: Implement unique constraints or primary keys on the ID column to enforce uniqueness.
    • DBCC CHECKIDENT: Use the DBCC CHECKIDENT command to check and correct the identity value if it gets out of sync.
    • Review Insert Logic: Double-check the logic used for inserting data from multiple sources to ensure there are no conflicts or race conditions.
    • Logs and Monitoring: Enable detailed logging and monitoring to capture the exact scenarios when duplicates are inserted. This can help in identifying any patterns or specific conditions leading to the issue.

    If the issue persists, do let us know

    For more detailed guidance refer to the official documentation

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.