Merge statement inserting duplicate rows on matching key

Nick Ryan 221 Reputation points
2022-03-09T21:19:16.12+00:00

I think I know the answer to this one but I just would like to confirm. There are rows in my source data-set that have identical values for the key used to match source to target but have differences in non-key columns. These are both being inserted because currently the table does not use the matching/business key but does use a surrogate IDENTITY column.

I think this works because determining whether a row in the source has a match in the target is done once at the start of the operation so that neither row has a match at that point.

Is that what is happening?

And it will break if I drop the surrogate PK and create one on the business key that is being used to do the match because it will end up with a duplicate. Is that also correct?

So basically the matching key is incomplete.

ADDITION
As Erland points out below, I didn't include an example so I'll modify his to explain my point as the actual table has too many columns.

CREATE TABLE #target(id int IDENTITY(1,1) NOT NULL,
  a int NOT NULL,
                     b int NOT NULL,
  c varchar(20) NOT NULL,
  d decimal(10,3) NULL,
                     PRIMARY KEY (id))
CREATE TABLE #source(a int NOT NULL,
                     b int NOT NULL,
  c varchar(20) NOT NULL,
  d decimal(10,3) NULL)
INSERT #target (a, b, c, d) VALUES (1, 1, 'OHAUB', NULL)
INSERT #source (a, b, c, d) VALUES (2, 2, 'MANAPOURI', NULL), (2, 2, 'OHAUA', NULL)
go
MERGE #target t
USING #source s ON t.a = s.a AND t.b = s.b
WHEN NOT MATCHED BY TARGET THEN
   INSERT (a, b, c, d) VALUES(s.a, s.b, s.c, s.d)
WHEN MATCHED THEN
   UPDATE
   (complicated match code around updating c and d depending on whether or not source and or target are NULL or not)
;

This is the situation at the moment but the integer surrogate key is meaningless and I am to replace it with the business key which was supposed to be the merge key columns. What I found was, this did not make a unique match and that column c should also have been included. However, the merge had been working up till now and although it is a rare occurrence when a and b do not make a unique row, there are 74 rows out of 2.5 million where that is not the case.

So, I need to modify the merge and the business key to include column c.

What I was really unsure of is how the earlier inserts that placed those 74 pairs of rows that are duplicate on columns a and b did not cause the error Erland mentions. That has been answered as the problem comes when 2 rows in the source try to update the same target row. It is not a problem when 2 rows in the source match on the merge key but have no match in the target.

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,657 questions
0 comments No comments
{count} votes

Accepted answer
  1. Erland Sommarskog 112.7K Reputation points MVP
    2022-03-09T22:10:48.777+00:00

    I think this works because determining whether a row in the source has a match in the target is done once at the start of the operation so that neither row has a match at that point.

    >

    Is that what is happening?

    I don't see your tables or query, but this example illustrates:

    CREATE TABLE #target(a int NOT NULL,
                         b int NOT NULL,
                         PRIMARY KEY (a,b))
    CREATE TABLE #source(a int NOT NULL,
                         b int NOT NULL,
                         PRIMARY KEY (a,b))
    
    INSERT #target (a, b) VALUES (1, 1)
    INSERT #source (a, b) VALUES (2, 1), (2, 2)
    
    go
    MERGE #target t
    USING #source s ON t.a = s.a
    WHEN NOT MATCHED BY TARGET THEN
       INSERT (a, b) VALUES(s.a, s.b)
    WHEN MATCHED THEN
       UPDATE
       SET b = s.b
    ;
    SELECT a, b FROM #target
    

    As you see both rows from #source are inserted., since none of them are matching.

    Observe now what happens if we re-run the MERGE statement:

    Msg 8672, Level 16, State 1, Line 12
    The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.

    This is different from an UPDATE where a value will be arbitrarily chosen from the source.

    And it will break if I drop the surrogate PK and create one on the business key that is being used to do the match because it will end up with a duplicate. Is that also correct?

    Again, I don't see your tables, so I'm uncertain what you have in mind. If you slap on a PK on the source table, you can not do that only on the columns you use in the MERGE, since, as I understand it, you have duplicate values for this key. If you add a PK on the full key on the source, but keep the MERGE statement, nothing is going to change. The example above illustrates.

    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Tom Phillips 17,741 Reputation points
    2022-03-10T19:40:57.807+00:00

    Your code is matching on :
    ON t.a = s.a AND t.b = s.b

    If you have more than 1 row matching that criteria, it will update all the matching rows.

    If you have no rows matching, it will insert all rows which do not match.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.