I will be answering question posted below which is similar to original question.
'''Hello,
I am trying to implement SCD Type 2 with Synapse SQL POOL using MERGE option as similar to above and getting the same error. Can some one share an example of how to implement the SCD Type 2 in the Synapse SQL Pool.
- Scenario:
- Source Table:
id name country
1 abc India
2 bcd US
- Expected Target Table:
id name country IsCurrent StartDate EndDate
1 abc Australia N 2021-11-12 2020-11-13
1 abc India Y 2021-11-13 9999-12-31
2 bcd US Y 2021-11-13 9999-12-31'''*
Answer - Let's first create the table and populate it with some values
create table test.stg
(
id int,
name varchar(20),
country varchar(30),
DimensionCheckSum int default -1
)
with
(
DISTRIBUTION = HASH ( [id] ),
CLUSTERED COLUMNSTORE INDEX
)
create table test.scd2
(
id int,
name varchar(20),
country varchar(30),
DimensionCheckSum int default -1,
ValidFrom datetime,
ValidTo date default '12/31/9999',
CurrentRecord int default 1
)
with
(
DISTRIBUTION = HASH ( [id] ),
CLUSTERED COLUMNSTORE INDEX
)
insert into test.stg
values (1,'abc','australia',-1)
insert into test.stg
values (2,'bcd','US',-1)
Now let's update checksum and then insert in target table(stg2 in our case) using INSERT command for updated records. We will then use MERGE statement to update the flag of updated records and insert altogether new records.
--Updates the Checksum of all the columns in a CheckSum field of Staging Table
UPDATE test.stg SET DimensionCheckSum=
BINARY_CHECKSUM
(
[id],
[name],
[country]
);
--Insert only those records which have an update by comparing Checksum against the record which was last active
INSERT INTO test.scd2
SELECT
stg_table.[id],
stg_table.[name],
stg_table.[country],
stg_table.[DimensionCheckSum],
GETDATE(),
'12/31/9999',
1
FROM test.stg stg_table
JOIN test.scd2 main_table
ON stg_table.id = main_table.id
WHERE (stg_table.DimensionCheckSum<> main_table.DimensionCheckSum) AND main_table.CurrentRecord<>0
MERGE test.scd2 AS target
USING test.stg AS source
ON target.Id = source.Id
--Updates validto and flag of the record which was last active
WHEN MATCHED and target.DimensionCheckSum <> source.DimensionCheckSum and target.CurrentRecord=1 THEN UPDATE SET
ValidTo=getdate(),
CurrentRecord=0
--Inserts an altogether new record which appeared for first time in source
WHEN NOT MATCHED BY TARGET THEN
INSERT
(
[Id],
[name],
[country],
[DimensionCheckSum],
[ValidFrom]
)
VALUES
(
source.[id],
source.[name],
source.[country],
source.[DimensionCheckSum],
getdate()
);
Let's check output at this stage
Now let's update one of the record in source table(stg in our case).
update test.stg
set country='india'
where id = 1
Now rerun the checksum , insert and merge query given above.
The output of target table (scd2) is as follows -
Hurray! We implemented SCD2.
Note - Both the tables are hash-distributed and the database is Azure Synapse Analytics Dedicated SQL Pool