azure synapse incremental pipeline duplicates at sink

Yvette Epingo 0 Reputation points
2023-03-31T06:39:22.8766667+00:00

Hello,

I have built a simple incremental pipeline that helps me extract each file name and takes the latestfile and does a simple upsert with expression true(). I noticed the sync table comes in with duplicate values and I made sure my sink skips duplicates but I get duplicated rows meanwhile the raw file only has unique row value.

for instance

Raw file has one record= 0155, distributionId = 1001, PeriodMonthYear = 202303, SupplierID = 2155 for file 20230331_1001

Normally when I do my change data capture to upsert only from new file, I would expect my sink table to have one added record for 0155.

But instead I am getting multiple lines for record 0155.

Please can someone help me with an ideal solution to have a true upsert without those duplicates? I built other pipelines that seemed to work with my current logic, but for some odd reason, this very logic seems to not properly work.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,472 questions
{count} votes

1 answer

Sort by: Most helpful
  1. HimanshuSinha-msft 19,381 Reputation points Microsoft Employee
    2023-03-31T21:54:34.22+00:00

    Hello @Yvette Epingo , Thanks for the question and using MS Q&A platform.

    But instead I am getting multiple lines for record 0155.

    As you mentioned that you are getting multiple records for 0155 , can you please confirm that you have the correct key columns set , also you should have the UPSERT option selected .

    User's image

    Thanks Himanshu

    Please accept as "Yes" if the answer provided is useful , so that you can help others in the community looking for remediation for similar issues. 

    0 comments No comments