Handle Duplicate Primary Key in Source Data

jn93 651 Reputation points
2022-10-26T03:11:09.783+00:00

Hi All, Lets say I have the datasets like below in my source.
The primary key of the dataset is combination of CHDRNUM,TRANNO and SEQNOEN.
However, I do see some of duplicate primary key with such combination as per highlighted in yellow below in source data.

How can I do a proper insert and update using SSIS if we facing such problem in source data. As per shown below, I should get the output same with source when doing insert/update.
As per shown in diagram below, I have two SSIS package to execute this. First SSIS package, which is naming as Sample_tday extract data from source where datime>last transaction date from main table (Note:datime is timestamp).
and keep in T_Sample.Second SSIS package, which is naming as Sample extract from T_Sample and lookup combination of CHDRNUM,TRANNO and SEQNOEN to check for insert/update in main Table.
Since the table is very huge, I need to do incremental insert/update in SSIS package. I have provided DDL and sample data population. Appreciate it if anyone can help.

DDL and Sample Data Population
-- DDL and sample data population, start
DECLARE @Bluemchen TABLE (CHDRNUM varchar(8), TRANNO int, SEQNOEN int, DTEEFF int, ENDNLNE varchar(60), USER_PROFILE varchar(10),DATIME datetime2(7) )
INSERT INTO @Bluemchen (CHDRNUM, TRANNO, SEQNOEN, DTEEFF,ENDNLNE,USER_PROFILE,DATIME) VALUES
('Z0012725','1','1','20200814','IT IS HEREBY DECLARED AND AGREED THAT WITH EFFECT FROM','TYM03465','2020-08-03 14:45:02.7549260'),
('Z0012725','1','2','20200814','IT IS HEREBY DECLARED AND AGREED THAT WITH EFFECT FROM','TYM03465','2020-08-03 14:37:59.7549260'),
('Z0012725','1','3','20200814',' ,THE INSUREDS NAME IS AMENDED TO READ AS ','TYM03465','2020-08-03 14:45:02.7549260'),
('Z0012725','1','3','20200814','14/08/2020, THE POLICY INSURED SHOULD READ CORRECTLY ','TYM03465','2020-08-03 14:37:59.7549260'),
('Z0012725','2','1','20200814','FOLLOWS:','TYM03465','2020-08-03 14:45:02.7549260'),
('Z0012725','2','2','20200814','AS BELOW:-','TYM03465','2020-08-03 14:37:59.7549260'),
('Z0012725','2','3','20200814','J','TYM03465','2020-08-03 14:45:02.7549260'),
('Z0012725','2','3','20200814','N','TYM03465','2020-08-03 14:45:02.7549260');
-- DDL and sample data population, end

254115-image.png254116-image.png

SQL Server Integration Services
SQL Server Integration Services
A Microsoft platform for building enterprise-level data integration and data transformations solutions.
2,493 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,583 questions
{count} votes

Accepted answer
  1. NikoXu-msft 1,911 Reputation points
    2022-10-26T07:47:32.05+00:00

    Hi @jn93 ,

    In the yellow data, since the data in all three columns is the same, if you want to perform an update operation, SQL server cannot determine which row to modify, it is recommended to add a column to distinguish.

    DECLARE @tbl TABLE (CHDRNUM varchar(8), TRANNO int, SEQNOEN int, DTEEFF int, ENDNLNE varchar(60), USER_PROFILE varchar(10),DATIME datetime2(7) )  
    INSERT INTO @tbl (CHDRNUM, TRANNO, SEQNOEN, DTEEFF,ENDNLNE,USER_PROFILE,DATIME) VALUES  
    ('Z0012725','1','1','20200814','IT IS HEREBY DECLARED AND AGREED THAT WITH EFFECT FROM','TYM03465','2020-08-03 14:45:02.7549260'),  
    ('Z0012725','1','2','20200814','IT IS HEREBY DECLARED AND AGREED THAT WITH EFFECT FROM','TYM03465','2020-08-03 14:37:59.7549260'),  
    ('Z0012725','1','3','20200814',' ,THE INSUREDS NAME IS AMENDED TO READ AS ','TYM03465','2020-08-03 14:45:02.7549260'),  
    ('Z0012725','1','3','20200814','14/08/2020, THE POLICY INSURED SHOULD READ CORRECTLY ','TYM03465','2020-08-03 14:37:59.7549260'),  
    ('Z0012725','2','1','20200814','FOLLOWS:','TYM03465','2020-08-03 14:45:02.7549260'),  
    ('Z0012725','2','2','20200814','AS BELOW:-','TYM03465','2020-08-03 14:37:59.7549260'),  
    ('Z0012725','2','3','20200814','J','TYM03465','2020-08-03 14:45:02.7549260'),  
    ('Z0012725','2','3','20200814','N','TYM03465','2020-08-03 14:45:02.7549260');  
      
    select CHDRNUM  
    ,TRANNO  
    ,SEQNOEN  
    ,ROW_NUMBER() over(partition by CHDRNUM,TRANNO,SEQNOEN order by ENDNLNE) as od   
    ,DTEEFF  
    ,ENDNLNE  
    ,USER_PROFILE  
    ,DATIME  
    from @tbl  
    

    This allows you to combine the first four columns as a primary key and then modify it.

    Best regards
    Niko

    ----------

    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".


0 additional answers

Sort by: Most helpful