Share via

alterrow - causing duplicates

arkiboys 9,711 Reputation points
2022-05-25T08:07:58.52+00:00

Hello,
in ADF dataflow I am using alterrow as well as sink delta to load storage gen2 with delta parquet files daily.
the delta parquet has a-lot of columns, i.e. column1, column2, ..., column20
settings as follows:
alterrow --> upsertIf --> true()
sink inline delta::
allow upsert is ticked
keycolumns:
column1, column2, column4, column7

Question:
I do not see how I get duplicates sometimes for rows

do you see what I am doing wrong?
Thank you

Azure Data Factory
Azure Data Factory

An Azure service for ingesting, preparing, and transforming data at scale.

0 comments No comments

Answer accepted by question author

ShaikMaheer-MSFT 38,631 Reputation points Microsoft Employee Moderator
2022-05-27T09:46:19.82+00:00

Hi @arkiboys ,

Thank you for posting query in Microsoft Q&A Platform.

Alter row Upsert If will only does updating records if they are exists and they are not exists inserting as new records. Upsert will not guarantee uniqueness or not having duplicates.

There is no Primary key concept in delta. Hence there is no direct way to making system to avoid duplicates. We should ensure from the first load itself check if any duplicates then remove then and load to delta. Or run a clean up process regularly on delta table which checks duplicates and remove them.

Hope this helps. Please let us know if any further quires.

--------------

Please consider hitting Accept Answer button. Accepted answers help community as well.

Was this answer helpful?

0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.