Filter out duplicate records with azure data factory before importing into Dynamics 365 user table

Andrew Macinnes 6 Reputation points
2022-06-29T14:39:32.763+00:00

I am looking to use Azure Data Factory to import a number of users from a third party source CSV file into the D365 user table/entity. This is usually straight forward but on this occasion I have a complication. The D365 users table/entity is already populated. The source CSV user file will have a mixture of users that are already in the D365 table/entity and others that are not.

What I would like to do is ensure the users in my source file that are already in the D365 table are not copied over as this would create duplicates.

Source CSV FILE

216100-image.png

Existing D365 User table (subset of fields just to illustrate) enter image description here

216181-image.png

Updated D365 table with two new record added from source csv enter image description here

216137-image.png

From what I can see there are two possible solutions in Azure Data Factory

1) Configure the D365 sink to do it. e.g configure th sink in order to ignore records that match on a certain column? Is it possible to configure the sink in some way to accomplish this?

2) Pull in the D365 table/entity as a source and use it to filter my source CSV to remove user records that already exist in D365 perhaps by using a common field such as fullname to identify such records. This would ensure I only try to import new users.

I have had a look into both methods but have been struggling to find a way to implement them.

I'd like to think the scenario I have outlined above is not uncommon and there are tried and tested methods to filter out records from a source CSV that already exists in the target D365 table?

I'd apprecate any help/suggestion to help me achieve this. Thanks

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,206 questions
{count} votes

2 answers

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,602 Reputation points Microsoft Employee
    2022-06-29T21:54:42.847+00:00

    Hello @Andrew Macinnes ,

    Thanks for the question and using MS Q&A platform.

    *1) Configure the D365 sink to do it. e.g configure th sink in order to ignore records that match on a certain column? Is it possible to configure the sink in some way to accomplish this? *

    Copy Activity: Yes, for this you can use Copy activity and use writeBehavior properties as Upsert in your sink settings. Please refer to below section of the docs

    Ref Doc: Dynamics as a sink type

    216292-image.png

    Mapping Data flow: If you would like to use Mapping Data flow, you will have to use alter transformation and then in sink transformation you can use Update method as upsert (please note by default it will be insert)

    Ref doc: Mapping data flow Sink transformation

    216286-image.png

    Here are few demo videos by community volunteers that will be helpful:

    *2) Pull in the D365 table/entity as a source and use it to filter my source CSV to remove user records that already exist in D365 perhaps by using a common field such as fullname to identify such records. This would ensure I only try to import new users. *

    For this implementation, you can use the Change data capture logic using Mapping data flow as described in this demo by Mark Krome from ADF Product team: Data Flows: How to capture changed data

    216311-image.png

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

  2. Andrew Macinnes 6 Reputation points
    2022-07-04T13:02:54.483+00:00

    Hi @KranthiPakala-MSFT .

    Thanks for your feedback.

    I was having trouble implementing the changes to the D365 to upsert which would have been my preferred option. It was complaining the a record with the username already existed which of course it does but I was hoping it would update that record. I have read that it needs to update on the primary key (systemuserid?) but my source csv file does not have that column/field. I'm using an alter row transformation and D365 sync upsert config to try and achieve this. I've tried various combinations of what I use for the alter row conditions and the D365 sync config but have been unable to get it to work.

    I've tried to upload some screenshots to help explain what I mean but it won't let me.

    I am using a dataflow rather than a copy activity as I need to do some transformations.

    For my alter row transformation I set the alter row conditions to :- Upsert if, isNull(Email)==false()

    I configure my D365 sync settings as follows....I select the checkbox for Allow insert and Allow Upsert. I leave the Alternate Key name blank (although have tried populating this with various fields but had no luck). If you have any advice how I can configure these setting to make it work that would be appreciated.

    In the meantime I went for another solution where I do a left join on the two sources and filter out the rows with nulls for the email field filed from the D365 entity source thereby leaving users that do not exits in D365. I then insert those records. This does not let me update existing record though which I may need to do later.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.