Azure Data Factory Cosmosdb comparing json documents

Jeff Copeland 1 Reputation point
2021-01-19T14:17:33.91+00:00

I am still looking for a solution that will allow me in a Data Factory pipeline to compare to different CosmosDB documents (json) to see if there are any changes.
I am going to be trying to move data from a 'Staging' Cosmosdb to a 'Production' Cosmosdb. Before I move it to production, I want to see if it already exists there, if it does I want to update it ONLY if it has changed.

I had another article/question relating to this and I never was able to get a solution.
Thanks

https://learn.microsoft.com/en-us/answers/questions/223886/azure-data-factory-cosmos-db-updating-changed-docu.html#answer-224352

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,768 questions
{count} votes

2 answers

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,036 Reputation points
    2021-01-20T05:41:26.367+00:00

    Hello @Jeff Copeland and welcome back to Microsoft Q&A. I'm sorry you had a negative experience.

    The task described will be very difficult to implement in Azure Data Factory. There is no feature that fits all the requirements.
    The difficulty is in part due to compatibility of activities with cosmos.

    Tools for checking existance:

    • validation activity (only available for structured stream)
    • get metadata activity (not available for cosmos)

    Tools for comparing content:

    • get metadata activity (only available for structured stream)
    • lookup activity (available for SQL API and for structured stream)

    While the lookup activity is available for cosmos, the lookup activity is limited in how much it returns. It stops at 4 MB or 5000 rows. I do not know the size of your documents, so I cannot say whether this will work for you.

    Given all these limitations, I need to get a little creative in a work-around.

    One idea is to copy the two documents to blob, and compare the MD5. This is not an efficient approach.

    Another idea is to craft some custom code for the job and run it as custom activity. With custom code you can run without ADF if you want.

    My colleague suggested using Cosmos change feed. I need to research this a little more and discuss with you whether the change feed can be used as part of the solution.


  2. MartinJaffer-MSFT 26,036 Reputation points
    2021-01-28T00:02:18.017+00:00

    @Jeff Copeland
    The change feed is a record of all inserts and updates to documents in CosmosDB. Each event can be used to trigger something else to happen. The idea I had was for these changes to be committed to a blob store. ADF would then use an Event Trigger to get notified of the change, and copy it to the destination Cosmos DB.

    See Change Feed Event Sourcing Pattern

    This would be ideal if the only changes to the destination cosmos are to come from this source cosmos. If other things could change the data in the destination cosmos, this pattern would not help. This pattern only takes updates as they happen in the source, it does not do active comparisons.

    I am not seeing any good way to do comparisons in ADF.

    0 comments No comments