Dedicated SQL Pool - Strange Duplicate Check Error

VS29 246 Reputation points
2022-08-19T21:06:40.51+00:00

I need some expert advice here...

I have a table with primary key of INT Data type. Logically, it should not allow duplicates and when checked for duplicates using count(*) I see none.

However, when I check for a particular value, it returns multiple records for the same field.

Environment - Dedicated SQL Pool.

Some one please provide any insight that you might have.

Thank you!

Azure SQL Database
Developer technologies Transact-SQL
SQL Server Other
{count} votes

Accepted answer
  1. Ronen Ariely 15,206 Reputation points
    2022-08-19T23:20:10.917+00:00

    Hi,

    OK... now that you showed us what you did, it is simple to explain the issue :-)

    The confusing is related to the way GROUP BY behaves in Dedicated SQL Pool.

    GROUP BY Is not fully supported and when you use simple GROUP BY on a unique constraint then the server do not aggregate the data. The number of rows in the result SET will be the same as the number of rows you inserted and the value of the function COUNT(*) will be 1.

    Therefore, when you added the filter HAVING COUNT(*)>1 then no row retuned.

    The following document presents a simple example. Notice that before adding the unique constrain the GROPUP BY works and returns only 4 rows but after adding the constraint it simply do nothing.

    It is well documented that GROUP BY in Dedicated SQL Pool is only fully supported when using one of the three types of grouping: GROUP BY with ROLLUP, GROUPING SETS or GROUP BY with CUBE

    https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-group-by-options?WT.mc_id=DP-MVP-5001699

    The solution in your case is to use a query like bellow (for example using ROLLUP)

    SELECT a1, count(*) AS total FROM t2 GROUP BY ROLLUP (a1) HAVING not a1 is null  
    

    ----------

    14150-image.pngRonen Ariely
    Personal Site | Blog | Facebook | Linkedin


1 additional answer

Sort by: Most helpful
  1. Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
    2022-08-19T21:39:19.62+00:00

    From the topic Primary key, foreign key, and unique key using dedicated SQL pool in Azure Synapse Analytics:

    Table constraints

    Dedicated SQL pool supports these table constraints:

    -- PRIMARY KEY is only supported when NONCLUSTERED and NOT ENFORCED are both used.
    -- UNIQUE constraint is only supported when NOT ENFORCED is used.

    For syntax, check ALTER TABLE and CREATE TABLE.

    FOREIGN KEY constraint is not supported in dedicated SQL pool.

    Remarks

    Having primary key and/or unique key allows dedicated SQL pool engine to generate an optimal execution plan for a query. All values in a primary key column or a unique constraint column should be unique.

    Important

    After creating a table with primary key or unique constraint in dedicated SQL pool, users need to make sure all values in those columns are unique. A violation of that may cause the query to return inaccurate result.

    That is, Synapse does not enforce the uniqueness, that onus is on you. (As for why it does not enforce it - keep in mind that it is parallel architecture with many nodes.)


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.