duplicating rows in parquet file (Sink Delta)

arkiboys 9,621 Reputation points
2021-10-13T15:04:15.27+00:00

Hi,
The pipeline uploads data from source to sink Delta parquet dynamically. One of the parameters passed to the dataflow is the KeyColumn.
If the object to load has only one KeyColumn, all works fine, i.e. update/insert/delete into parquet sink delta.
The problem is when the keyColumn is more than one, then the parquet file is keep being inserted with new rows for each of the Keys.
It looks like the keycolumns are not being handled correctly in the dataflow if they are more than one, i.e. "ProductID, LocationID"

data flow has:
delta sink setting--> update
delta sink setting--> delete
delta sink setting--> insert

It looks like the issue is on the update I think.

Any suggestions?

140190-image.png

140303-image.png

140210-image.png

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,533 questions
{count} votes

Accepted answer
  1. ShaikMaheer-MSFT 37,896 Reputation points Microsoft Employee
    2021-10-14T10:45:20.773+00:00

    Hi @arkiboys ,

    your columns names string has space after comma. So I doubt when you use split() function to split based on ',' then in that array second column may come as space in it and creating mess.

    Could you please try removing space after comma and see if that helps? Please let us know how it goes.


0 additional answers

Sort by: Most helpful