question

arkiboys avatar image
0 Votes"
arkiboys asked ShaikMaheer-MSFT answered

alterrow - causing duplicates

Hello,
in ADF dataflow I am using alterrow as well as sink delta to load storage gen2 with delta parquet files daily.
the delta parquet has a-lot of columns, i.e. column1, column2, ..., column20
settings as follows:
alterrow --> upsertIf --> true()
sink inline delta::
allow upsert is ticked
keycolumns:
column1, column2, column4, column7

Question:
I do not see how I get duplicates sometimes for rows

do you see what I am doing wrong?
Thank you

azure-data-factory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

ShaikMaheer-MSFT avatar image
0 Votes"
ShaikMaheer-MSFT answered

Hi @arkiboys ,

Thank you for posting query in Microsoft Q&A Platform.

Alter row Upsert If will only does updating records if they are exists and they are not exists inserting as new records. Upsert will not guarantee uniqueness or not having duplicates.

There is no Primary key concept in delta. Hence there is no direct way to making system to avoid duplicates. We should ensure from the first load itself check if any duplicates then remove then and load to delta. Or run a clean up process regularly on delta table which checks duplicates and remove them.

Hope this helps. Please let us know if any further quires.


Please consider hitting Accept Answer button. Accepted answers help community as well.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.