Track data quality issues in Azure Data Factory

Degenocrat 27 Reputation points
2021-12-20T19:55:48.683+00:00

Hello! I'm ingesting two "Account" tables from 2 different countries from 2 different Salesforce instances. I'm joining them together on an ID key.
A number of columns are supposed to match their records because Salesforce is supposed to synchronise them on their system.

I have built a data flow that filters the joined account table on the columns that do not equal each other (i.e. column1 != column2 OR column3 != column4 OR etc.).
Now, I am currently writing the results into a sink "tracking file" , but I would like this data flow to run daily. At the moment it will overwrite this data file each time and it's possible the previous days' mismatch has been fixed already. I however want to track how often this data quality issue arises.

My question therefore: How do I append new rows to my tracking file when there is a new record with a mismatch and keep the previous cases there was a mismatch (even if it isn't an issue anymore)?

I'm thinking parameterise the writing to sink to a new file each time it runs, build another data flow that creates a union of an existing tracking file and each new days' file, write to existing tracking file to update.

Is there a better way? Perhaps the filter on "not equal" columns in records is not the right way? Another thought was to use a Lookup function, but the overwriting in sink function persists.

Thanks in advance!

P.S. Using ADF2, Salesforce linked service, and Data Lake Storage Gen2.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,350 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,609 questions
{count} votes

Accepted answer
  1. ShaikMaheer-MSFT 37,896 Reputation points Microsoft Employee
    2021-12-21T16:19:58.77+00:00

    Hi @Anonymous ,

    Thank you for posting query in Microsoft Q&A Platform.

    Files in Data lake storage will always gets override. Hence you cant updated file to append a row.

    I could see you are already on right track with workarounds. That is, storing new file and doing Union or Filtering not equal or using lookup or Exists transformations may help. I encourage you to proceed to implement your idea as Data lake files always going to override only.

    Hope this will help. Please let us know if any further queries.

    ----------------

    Please consider hitting Accept Answer. Accepted answers helps community as well.

    2 people found this answer helpful.

0 additional answers

Sort by: Most helpful