CosmosDB API for mongoDB how to ignore existing json or merge/update the same json

Jack 46 Reputation points
2022-01-25T09:06:55.687+00:00

First pipeline in ADF is call external REST API to get the json file store in blob storage and then use the second pipeline to push the json into cosmosDB from blob.

Overall process is smooth but we found that cosmosDB will always create a new json file in document collecter with different "ObjectID" even it is same json in blob, it will create a lot of json when second pipeline auto trigger the schedule jobs for push json. Second pipeline we are using upsert for write behaviour alredy

  1. Is there any ways to sync the json content when json in blob is updated ? but not to create more json
  2. How to skip the same json file in blob when auto trigger push into cosmosDB

Just want to not duplicate more same json file in cosmosDB, keep in one

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,435 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,471 questions
0 comments No comments
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 26,011 Reputation points
    2022-01-25T21:21:18.647+00:00

    Hello @Jack and welcome to Microsoft Q&A.

    It sounds like you are having trouble with making your upsert operation into Cosmos behave like an upsert rather than an insert.

    To update an object instead of creating a new one, you must specify the ObjectID. Otherwise the ObjectID is auto-generated and new item is inserted.
    If your source JSON have a property you would like to use as an ID, leverage that in the mapping section of your copy activity like below picture.

    168486-image.png

    In connector-azure-cosmos-db: Cosmos as Sink, this is called out:

    Describes how to write data to Azure Cosmos DB. Allowed values: insert and upsert.

    The behavior of upsert is to replace the document if a document with the same ID already exists; otherwise, insert the document.

    Note: The service automatically generates an ID for a document if an ID isn't specified either in the original document or by column mapping. This means that you must ensure that, for upsert to work as expected, your document has an ID.


0 additional answers

Sort by: Most helpful