Table keys in dedicated SQL pools

Question

Table keys in dedicated SQL pools

Muruga MuthuKrishnan 26

Hi,

Tables created in standalone Dedicated SQL pool or in Synapse Analytics with primary key or constraint are useful for query optimization. But here primary keys are not going to maintain and perform record uniqueness check based on defined primary keys/constraints and terminate the insert operations. Please correct me if there is any gap in the understanding.

If we need to define a table with Primary key which will perform data uniqueness check and maintain unique record in the table how to achieve that?

Accepted answer

0 additional answers

Your answer

Answer 1

Bhargava-MSFT 31,261 Microsoft Employee Moderator

Hello @Muruga Muthukrishnan,

Welcome to the MS Q&A platform.

You are correct. Having primary key and/or unique key allows dedicated SQL pool engine to generate an optimal execution plan for a query. but it does not guarantee uniqueness or perform record uniqueness checks during insert operations. users needs to make sure primary key column or a unique constraint column values are unique.

In Synapse, PRIMARY KEY is only supported when NONCLUSTERED and NOT ENFORCED are both used.

UNIQUE constraint is only supported when NOT ENFORCED is used.

The creation of indexes with unique constraint is not supported in Synapse.

By default, dedicated SQL pool creates a clustered columnstore index when no index options are specified on a table.

For your question:

To maintain the data uniqueness, you can use UNIQUE constraint with Not enforced or Primary key with Non Clustered and not enforced needs to use.

Here is an example with a unique constraint:

Without unique constraint

CREATE TABLE t1 (a1 INT, b1 INT)

INSERT INTO t1 VALUES (1, 100)

INSERT INTO t1 VALUES (1, 1000)

SELECT a1, COUNT(*) AS total FROM t1 GROUP BY a1

result:

1 2

drop table t1

with unique constraint

CREATE TABLE t1 (a1 INT UNIQUE NOT ENFORCED, b1 INT)

INSERT INTO t1 VALUES (1, 100)

INSERT INTO t1 VALUES (1, 1000)

SELECT a1, COUNT(*) AS total FROM t1 GROUP BY a1

result:

1 1

Here, with the unique constraint, both a1 values are considered as unique records.

With Primary Key:

drop table t1

CREATE TABLE t1 (a1 INT NOT NULL, b1 INT)

ALTER TABLE t1 add CONSTRAINT PK_t1_a1 PRIMARY KEY NONCLUSTERED (a1) NOT ENFORCED

INSERT INTO t1 VALUES (1, 100)

INSERT INTO t1 VALUES (1, 1000)

SELECT a1, COUNT(*) AS total FROM t1 GROUP BY a1

Result:

1 1

Please note: In both cases, Users need to make sure all values in those columns are unique. A violation of that may cause the query to return inaccurate result.

I hope this helps. Please let me know if you have any further questions.

Reference document: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-table-constraints

If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions

Muruga MuthuKrishnan 26 Reputation points

2023-03-17T06:34:20.73+00:00

Thanks for your response. Do we have any approach to ensure data uniqueness can be checked while inserting the records to the tables. Consider user can't ensure the data uniqueness of all the columns
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-03-17T16:41:10.9333333+00:00

Hello @Muruga Muthukrishnan,

Thanks for the reply. Unfortunately, synapse can't enforce data uniqueness while inserting the records into the tables.

The reason is Primary key, and Unique constraints are only supported when NOT ENFORCED is used.

This is called Information Constraints(for the optimal execution plan).

Informational constraints are not enforced by the database engine, and are not used for additional verification of data; rather, they are used to improve query performance.

But I can check with my internal team to see if there is any other mechanism to enforce uniqueness while inserting the data in the synapse.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-03-17T20:17:10.68+00:00

Hello @Muruga Muthukrishnan,

Here is an update.

By default, synapse doesn't enforce data uniqueness while inserting the records into the tables. But you can use other methods like Hashing in dataflow for detecting duplicate rows before inserting them into the synapse.

Below is a video tutorial explaining Hashing with ADF and synapse.

https://www.youtube.com/watch?v=Id82NZo9hxM&ab_channel=AzureDataFactory

I hope this helps. Please let me know if you have any further questions.

Muruga MuthuKrishnan 26

Hi Bhargava,

 Thanks for your response, was implementing the Hash based checking using SQL got the clarity how to achieve the same using dataflow. Appreciated your inputs.

Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-03-20T17:22:44.7133333+00:00

Thank you, Muruga Muthukrishnan

Share via

Table keys in dedicated SQL pools

0 additional answers

Your answer