How to sync deleted status of items using Azure Data Factory?

osharena 86 Reputation points
2022-12-26T05:47:17.963+00:00

Hi,

I've composed "data copy" of adf pipeline to sync database items from CosmosDBNoSql account "A" to samely CosmosDBNoSql account "B".
I mean, I'm using adf pipeline as sync feature between two CosmosDB containers.
It works well with "upsert" writing and I can see items in account "A" copied well to account "B".

The problem is that items in a container of account "B" is still remained even though I delete them in container of account "A".
Is there any setting I can change to make the items in "B" to be automatically deleted when I delete the same items in "A"?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,655 questions
{count} votes

5 answers

Sort by: Most helpful
  1. AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator
    2022-12-29T05:58:06.543+00:00

    Hi @osharena ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your question here.

    As I understand your issue, your query is regarding deletion of records from sink if the record is no longer present in the source. Please let me know if that is not the correct understanding.

    There is no direct way to achieve 'Delete from source' using ADF pipeline till now. Upsert only takes care of updating the existing record in sink wherever the keycolumn matches with the source , or inserting the new record.

    • If you want to delete the data from sink (say table_sink) , which no more in source, then he might need to load the existing source data into a dummy table (say table_dummy) in sink db and write some stored procedure or build a pipeline to compare table_sink and table_dummy data and delete the records from table_sink which are present in table_sink but not in table_dummy .
    • You can also add two sources (source table and sink table) use Alter row transformation on top of sink table in Mapping dataflow to write the condition to delete data from sink .

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.
    0 comments No comments

  2. PRADEEPCHEEKATLA 90,661 Reputation points Moderator
    2022-12-27T07:35:58.107+00:00

    Hello @KwangjinNoh-5189,

    Thanks for the question and using MS Q&A platform.

    In order to delete items in the destination container when you delete them in the source container, you can set the "Sink" settings of your ADF pipeline to "Delete".

    Here is how you can do it:

    • Go to the Azure portal and navigate to your ADF pipeline.
    • In the pipeline designer, click on the sink dataset that corresponds to the destination container in Cosmos DB account B.
    • In the Properties panel on the right side, find the "Sink" setting and set it to "Delete".
    • Save the changes to your pipeline.
    • With this setting, when you delete an item in the source container in Cosmos DB account A, the corresponding item in the destination container in Cosmos DB account B will also be deleted.

    Note that the "Delete" setting will delete all items in the destination container that do not have a corresponding item in the source container. So if you have added any additional items to the destination container that are not present in the source container, they will also be deleted. If you do not want these additional items to be deleted, you may need to consider a different approach, such as using the "Upsert" or "Merge" sink settings.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is jhow you can be part of Q&A Volunteer Moderators

  3. osharena 86 Reputation points
    2022-12-28T07:02:12.07+00:00

    274398-adf1.png

    Thank you for answering.

    I can't find "Delete" in the setting of "Sink" like the attachment.
    It's only showing "Add dynamic content", "Insert", "Upsert".

    Could you also check why this is happening?

    0 comments No comments

  4. osharena 86 Reputation points
    2022-12-29T08:41:23.957+00:00

    274744-adf2.png

    Thank you for your useful information.

    Now, I'm trying to set Alter row transformation accepting your second recommendation.
    But, I have no idea how to set Alter row conditions exactly.

    The real conditions I actually want to set is like following.
    (1) If the firmcode(column) value of an item of source1(CosmosDbNoSqlContainer1) is not in any item of sink1(CosmosDbNoSqlContainer2), then insert to sync1.
    (2) If the same firmcode(column) value of an item of source1(CosmosDbNoSqlContainer1) is already in an item of sink1(CosmosDbNoSqlContainer2), then just upsert to sync1.
    (3) If the firmcode(column) value of an item of sink1(CosmosDbNoSqlContainer2) is not in any item of source1(CosmosDbNoSqlContainer1), then delete the item from sync1.

    For those conditions, I've set like attachment and also set "Allow insert/delete/upsert" in sync setting, but [Insert if], [Upsert if] conditions isn't working well(it always works as like "Insert"), and I have no idea how to set for [Delete if].

    Could you give me more advices?

    0 comments No comments

  5. AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator
    2023-01-13T09:11:56.49+00:00

    Hi @osharena ,

    I understand your approach of achieving the requirement . I would suggest you to add your source dataset as source1 and sink dataset as source2 and use two join transformations to get

    1. the data which are not present in source and not in sink and perform upsert using alter row transformation with condition as update if firmcode1=firmcode2 . Since by default the condition is 'Insert' for all records, so no need to explicitly write condition for insert. Also, In sink settings, unselect 'Allow insert' and select 'Allow upsert'

    276027-leftjoin-upsert.gif

    276020-image.png

    2. the data which are present in sink and not in source and perform delete using alter row transformation with condition as delete if isNull(firmcode1) . In sink settings, select 'Allow delete'

    276026-rightjoin-deletefromsink.gif

    276005-image.png

    ----------

    Hope it helps. Please consider accepting answer by hitting on Accept answer button.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.