Azure Data Factory Upsert not working properly

Clifford Gentiles 21 Reputation points
2023-01-05T13:23:18.91+00:00

Hello,

I am using Copy Activity which will copy parquet file to SQL Datawarehouse. Sink is set to upsert with identified key columns, however I am still getting duplicate based on the concatenated key columns. Please see images below

Sink set to upsert with key columns
276544-image.png

But duplicate is still there after running the pipeline, value for dup should only be '1' to confirm that there's no duplicate
276509-image.png

Your response would be very much appreciated.

Thanks

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,928 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,722 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Rejas, Jose 20 Reputation points
    2023-02-28T15:20:02.89+00:00

    Hi everyone.

    I think I know what's going on here.

    I have the same problem in my project, and I'm going crazy, but I have a clue:

    When I have duplicates at the end, but NOT at the source, it is because of the key Columns: one of them may be null, so the Copy Data activity does not concatenate the Key columns correctly. This could be very common, for example, in spark sql.

    Please, let me know if this is a valid solution for you

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.