dataflow exists transformation in debug behaves different to run-time

arkiboys 9,711 Reputation points
2022-06-01T17:58:23.47+00:00

Hi,
I have created a dataflow using exists transformation.
There are two exists. one left join which is used to be directed towards insert alter row and the other exists is inner join which directs the rows towards update.
During the debug, it seems it is working as expected but during run time it seems to be directed only towards the insert and so running it several times causes duplicates of rows in sink...
I have looked closely and checked parameters, etc. but not sure what is causing this difference in behavior.
Any suggestions?
Here is a screen-shot in-case it helps
Perhaps I can send you more screen-shots if you prefer.
As you see in screen-shots, the update works correctly if I re-run for the load already ran before but when I run the pipeline the it only does the insert for each of the same load which means I get duplicated rows.
Thank you

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
0 comments No comments
{count} votes

Answer accepted by question author
  1. ShaikMaheer-MSFT 38,631 Reputation points Microsoft Employee Moderator
    2022-06-03T15:58:44.187+00:00

    Hi @arkiboys ,

    Thank you for posting query in Microsoft Q&A Platform.

    Your implementation looks like SCD type2 to me, where you are trying to find out new rows and also existing rows. If existing rows then update and if new rows then insert.

    In these kind of implementations Dataflow settings which helps to order sinks plays key role to avoid these kind of issues.

    Kindly try to order the sinks by analyzing your requirement. For example, If you feel first doing updates and then doing inserts helps you, then consider ordering them in data flow settings and see if that helps. It can be reverse order as well. Please think all kind of scenarios based on your data and implementation and choose sink order wisely.

    For example, here I chosen update to happen first and then insert.
    208277-image.png

    Please check video of SCD type1 to get idea.
    Slowly Changing Dimension(SCD) Type 2 Using Mapping Data Flow in Azure Data Factory

    Hope this helps. Please let us know if any further queries.

    --------
    Please consider hitting Accept Answer. Accepted answers help community as well.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.