Delete duplicated lines based on 2 primary key

Rami Frikha 91 Reputation points
2020-08-17T10:57:41.437+00:00

Greeting guys

i want to delete duplicated primary keys in a table based on update date :

Ex i have this table :

17900-image.png

i want to delete the duplicated lines based on the primary key (ID1 and ID 2 are my primary key ) . i want to keep the line with the last Updated date

the results of this example will be :

17899-image.png

Thanks in advance all

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,552 questions
{count} vote

Accepted answer
  1. Yitzhak Khabinsky 24,941 Reputation points
    2020-08-17T12:28:52.777+00:00

    You can try the following classic approach.
    The ROW_NUMBER() is a window function that assigns a sequential integer to each row within the partition of a result set. The row number starts with 1 for the first row in each partition. Eventually, we delete duplicate rows where seq(uence) > 1

    SQL

    -- DDL and sample data population, start
    DECLARE @tbl TABLE (ID1 INT, ID2 INT, Updated DATE);
    INSERT INTO @tbl (ID1, ID2, Updated)
    VALUES
    ( 1, 1, '2020-01-08'),
    ( 1, 1, '2020-01-07'),
    ( 1, 1, '2020-01-07'),
    ( 1, 2, '2020-01-08'),
    ( 1, 3, '2020-01-08');
    -- DDL and sample data population, end
    
    ;WITH rs AS
    (
     SELECT * 
     , ROW_NUMBER() OVER (PARTITION BY ID1, ID2 ORDER BY Updated DESC) AS seq
     FROM @tbl
    )
    DELETE FROM rs
    WHERE seq >1;
    
    -- test
    SELECT * FROM @tbl;
    

    Output

    +-----+-----+------------+
    | ID1 | ID2 |  Updated   |
    +-----+-----+------------+
    |   1 |   1 | 2020-01-08 |
    |   1 |   2 | 2020-01-08 |
    |   1 |   3 | 2020-01-08 |
    +-----+-----+------------+
    
    2 people found this answer helpful.
    0 comments No comments

3 additional answers

Sort by: Most helpful
  1. Viorel 112.1K Reputation points
    2020-08-17T14:12:21.95+00:00

    Check an intuitive approach too:

    delete from MyTable
    from MyTable as t
    where exists ( select * from MyTable where id1 = t.ID1 and ID2 = t.ID2 and [Updated Date] > t.[Updated Date] )
    

    Although, it does not remove duplicate rows.

    1 person found this answer helpful.
    0 comments No comments

  2. MelissaMa-MSFT 24,176 Reputation points
    2020-08-18T05:51:42.93+00:00

    Hi RamiFrikha,

    Please refer below two methods:

    Method One :

    DELETE A 
    FROM @tbl a
    JOIN (SELECT ID1,ID2, MAX(Updated) AS MaxTime FROM @tbl GROUP BY ID1,ID2) B
    ON A.ID1 = B.ID2 and A.ID2=B.ID2
    WHERE A.Updated<B.MaxTime
    
    SELECT * FROM @tbl
    

    Method Two:

    ;WITH rs AS
     (
      SELECT * 
      , RANK() OVER (PARTITION BY ID1, ID2 ORDER BY Updated DESC) AS RN
      FROM @tbl
     )
     DELETE FROM rs
     WHERE RN >1;
    
     SELECT * FROM @tbl;
    

    Output:

    ID1 ID2 Updated
    1 1 2020-01-08
    1 2 2020-01-08
    1 3 2020-01-08

    If the response helped, do "Accept Answer" and upvote it.
    Best regards
    Melissa

    1 person found this answer helpful.
    0 comments No comments

  3. Guoxiong 8,126 Reputation points
    2020-08-17T18:35:44.977+00:00

    Based on your sample data, the columns ID1 and ID2 are not a primary key in the table since you cannot insert the duplicate key values into the table. For example:

    CREATE TABLE [dbo].[tbl](
        [ID1] [int] NOT NULL,
        [ID2] [int] NOT NULL,
        [UpdatedDate] [date] NOT NULL,
        CONSTRAINT [PK_tbl] PRIMARY KEY CLUSTERED 
        (
            [ID1] ASC, [ID2] ASC
        )
    ) ON [PRIMARY];
    GO
    

    If you want to insert your sample values:

    INSERT INTO tbl (ID1, ID2, UpdatedDate) VALUES
    (1, 1, '2020-01-08'),
    (1, 1, '2020-01-07'),
    (1, 1, '2020-01-07'),
    (1, 2, '2020-01-08'),
    (1, 3, '2020-01-08');
    

    you will get the following error message:

    Msg 2627, Level 14, State 1, Line 22
    Violation of PRIMARY KEY constraint 'PK_tbl'. Cannot insert duplicate key in object 'dbo.tbl'. The duplicate key value is (1, 1).

    I guess you want to add a primary key on the columns ID1 and ID2 after you clean up the duplicates. As YitzhakKhabinsky-0887 mentioned, you can use a window function ROW_NUMBER() to remove the duplicates and then add a primary key:

    ALTER TABLE [dbo].[tbl] 
    ADD CONSTRAINT [PK_tbl] PRIMARY KEY CLUSTERED 
    (
        [ID1] ASC,
        [ID2] ASC
    );
    GO
    
    0 comments No comments