SCD Type 2 Implementation in dedicated SQL pool of Azure Synapse Analytics

Sonali Singh8 6 Reputation points
2021-10-20T11:48:25.493+00:00

Hi,

I am trying to implement SCD Type 2 in Azure synapse using MERGE statement but it is throwing me the following error:

Incorrect syntax near 'MERGE'.

Is there something from the below code which is not supported in Synapse or is there some mistake in the code itself?

Code is as follows:

INSERT INTO dbo.DimSCDType2
(
CustomerKey
,CustomerName
,DOB
,FromDate
,ToDate
,ActiveFlag
)
SELECT
CustomerKey
,CustomerName
,DOB
,FromDate
,ToDate
,ActiveFlag
FROM
(
MERGE INTO dbo.DimSCDType2 as TARGET
using dbo.source as SOURCE ON
(SOURCE.CustomerKey = TARGET.CustomerKey) AND TARGET.ActiveFlag = 'Y'

WHEN MATCHED AND 
TARGET.DOB <> SOURCE.DOB OR 
TARGET.CustomerName <> SOURCE.CustomerName AND
TARGET.ActiveFlag = 'Y'

THEN UPDATE SET
    TARGET.ToDate = getdate()-1,
    TARGET.ActiveFlag = 'N'

WHEN NOT MATCHED THEN
INSERT 
(
    CustomerKey
    ,CustomerName
    ,DOB
    ,FromDate
    ,ActiveFlag
)
VALUES 
(
    SOURCE.CustomerKey
    ,SOURCE.CustomerName
    ,SOURCE.DOB
    ,GETDATE()
    ,'Y'
)
OUTPUT $action,
    SOURCE.CustomerKey
    ,SOURCE.CustomerName
    ,SOURCE.DOB
    ,GETDATE()
    ,'Y'

) AS Changes
(
ACTION,
CustomerKey,
CustomerName,
DOB,
FromDate,
ActiveFlag
)
WHERE action='UPDATE';

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} vote

7 answers

Sort by: Most helpful
  1. MADHUSUDAN PANWAR 101 Reputation points
    2023-01-20T17:18:46.06+00:00

    I will be answering question posted below which is similar to original question.

    '''Hello,

    I am trying to implement SCD Type 2 with Synapse SQL POOL using MERGE option as similar to above and getting the same error. Can some one share an example of how to implement the SCD Type 2 in the Synapse SQL Pool.

    • Scenario:
    • Source Table:

    id name country
    1 abc India
    2 bcd US


    • Expected Target Table:

    id name country IsCurrent StartDate EndDate
    1 abc Australia N 2021-11-12 2020-11-13
    1 abc India Y 2021-11-13 9999-12-31
    2 bcd US Y 2021-11-13 9999-12-31
    '''*

    Answer - Let's first create the table and populate it with some values

    create table test.stg
    (
        id int,
        name varchar(20),
        country varchar(30),
        DimensionCheckSum int default -1
    
    )
    with
    (
        DISTRIBUTION = HASH ( [id] ),
    	CLUSTERED COLUMNSTORE INDEX
    )
    
    create table test.scd2
    (
        id int,
        name varchar(20),
        country varchar(30),
        DimensionCheckSum int default -1,
        ValidFrom datetime,
        ValidTo date default '12/31/9999',
        CurrentRecord int default 1
    
    )
    with
    (
        DISTRIBUTION = HASH ( [id] ),
    	CLUSTERED COLUMNSTORE INDEX
    )
    
    insert into test.stg
    values (1,'abc','australia',-1)
    insert into test.stg
    values (2,'bcd','US',-1)
    

    User's image

    Now let's update checksum and then insert in target table(stg2 in our case) using INSERT command for updated records. We will then use MERGE statement to update the flag of updated records and insert altogether new records.

    --Updates the Checksum of all the columns in a CheckSum field of Staging Table
    UPDATE test.stg SET DimensionCheckSum=
    BINARY_CHECKSUM
    (
        [id],
        [name],
        [country]
    );
    
    --Insert only those records which have an update by comparing Checksum against the record which was last active
    INSERT INTO test.scd2
        SELECT 
        stg_table.[id],
        stg_table.[name],
        stg_table.[country],
        stg_table.[DimensionCheckSum],
        GETDATE(), 
        '12/31/9999', 
        1
        FROM test.stg stg_table
        JOIN test.scd2 main_table 
        ON stg_table.id = main_table.id
        WHERE (stg_table.DimensionCheckSum<> main_table.DimensionCheckSum) AND main_table.CurrentRecord<>0
    
    
    
    MERGE test.scd2 AS target
    USING test.stg AS source 
    ON target.Id = source.Id
    
    --Updates validto and flag of the record which was last active
    WHEN MATCHED and target.DimensionCheckSum <> source.DimensionCheckSum and target.CurrentRecord=1 THEN UPDATE SET 
        ValidTo=getdate(),
        CurrentRecord=0
    
    --Inserts an altogether new record which appeared for first time in source
    WHEN NOT MATCHED BY TARGET THEN  
        INSERT 
        (
        [Id],
        [name],
        [country],
        [DimensionCheckSum],
        [ValidFrom]
        )
        VALUES 
        (
        source.[id],
        source.[name],
        source.[country],
        source.[DimensionCheckSum],
        getdate()
        );
    
    

    Let's check output at this stage

    User's image

    Now let's update one of the record in source table(stg in our case).

    update test.stg
    set country='india'
    where id = 1
    
    

    Now rerun the checksum , insert and merge query given above.

    The output of target table (scd2) is as follows -

    User's image

    Hurray! We implemented SCD2.

    Note - Both the tables are hash-distributed and the database is Azure Synapse Analytics Dedicated SQL Pool

    2 people found this answer helpful.
    0 comments No comments

  2. HimanshuSinha-msft 19,486 Reputation points Microsoft Employee Moderator
    2021-10-21T16:56:34.927+00:00

    Hello @Sonali Singh8 ,
    Thanks for the ask and using Microsoft Q&A platform .

    On looking the code snippet shared , I am confused as I think we need to have a more clarity on the logic whcih you are tryibng to put . The MERGE command is conbination of UPDATE & INSERT , depeneding on the conditions set . I see that you are already setting the INSERT condition with the beklow logic

    WHEN NOT MATCHED THEN
    INSERT
    (
    CustomerKey
    ,CustomerName
    ,DOB
    ,FromDate
    ,ActiveFlag
    )
    VALUES
    (
    SOURCE.CustomerKey
    ,SOURCE.CustomerName
    ,SOURCE.DOB
    ,GETDATE()
    ,'Y'
    )

    I am not sure as to why you are adding the INSERT command in the staring of the query .

    INSERT INTO dbo.DimSCDType2
    (
    CustomerKey
    ,CustomerName
    ,DOB
    ,FromDate
    ,ToDate
    ,ActiveFlag
    )
    SELECT
    CustomerKey
    ,CustomerName
    ,DOB
    ,FromDate
    ,ToDate
    ,ActiveFlag

    Do me it looks like the above is not required . If the issue is still not resolved , please share the dummy data for the tables in questions and I will try to work on the same on my side .

    Please do let me know how it goes .
    Thanks
    Himanshu

    -------------------------------------------------------------------------------------------------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

  3. Sathya Narayanan 1 Reputation point
    2021-11-12T04:25:54.48+00:00

    Hello,

    I am trying to implement SCD Type 2 with Synapse SQL POOL using MERGE option as similar to above and getting the same error. Can some one share an example of how to implement the SCD Type 2 in the Synapse SQL Pool.

    • Scenario:
    • Source Table:

    id name country
    1 abc India
    2 bcd US

    **

    • Expected Target Table:

    **
    id name country IsCurrent StartDate EndDate
    1 abc Australia N 2021-11-12 2020-11-13
    1 abc India Y 2021-11-13 9999-12-31
    2 bcd US Y 2021-11-13 9999-12-31


  4. Kokila 16 Reputation points
    2022-03-03T07:19:27.76+00:00

    Hi @Sathya Narayanan

    I am trying to solve the same problem .. Did you manage to figure our the solution for SCD-type 2 in Syanpse Dedicated pool.

    Merge is not working as still in preview
    Trying to create Dataflow to implement SCD type2

    Any help much appriciated.Thanks


  5. AaronHughes 396 Reputation points
    2022-03-03T11:29:47.267+00:00

    https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver15
    is how you correctly use the merge syntax

    in terms of the SQLPool if you cant get merge to correctly work then you can do it using proc that does seperate INSERT and UPDATE operations based on the no match / match conditions using joins from the source to dest table

    for example
    step 1 update existing records where matches
    --update into b (table2) -

    from table1 as a

    inner join table2 as b
    on a.pk = b.pk

    where a.cols <> b.cols ----(use OR to expand) repeated statement to ensure all changeable columns are compared and then updated when not the same
    --OR a.col1 <> b.col1 --- etc

    step2 insert missing records
    --insert into b (table2) -

    from table1 as a

    left outer join table2 as b
    on a.pk = b.pk

    where b.pk is null

    step3 optional - flag deletes or delete records no longer in source table
    --deletes -- I tend to do this with a is_deleted flag and a UPDATE, however you can use delete from

    from table1 as a

    right outer join table2 as b
    on a.pk = b.pk

    where a.pk is null


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.