Azure Synapse Pipeline - Copy Data to CosmosDB w Upsert - How to check if items missing in new extract are in CosmosDB and then delete them?

Richards, Sam (DG-STL-HQ) 151 Reputation points
2022-05-13T13:07:58.073+00:00

I have an Azure Synapse pipeline that moves data from a .csv into a .parquet file which is then copied into two location, Azure SQL table and Azure CosmosDB. When moving into the Azure SQL table, I can delete existing records and the insert the new dataset. The CosmosDB portion uses upsert to update existing ones and insert new records. However, I am at a loss as how to check if any records in the CosmosDB are not in the new .parquet file and then delete them if they do not exist.

Is this better achieved on the CosmosDB side with a stored procedure or is there a method of using dynamic content on the write behavior of the copy data activity in Azure Synapse?201824-2022-05-13-08-06-22-bwdgsynapsedev01-azure-synapse.png

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,902 questions
0 comments No comments
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator
    2022-05-15T18:58:31.813+00:00

    Hi @Richards, Sam (DG-STL-HQ) ,
    Welcome to Microsoft Q&A platform and thankyou for posting your query.
    As I understand your question, it seems you are trying to figure out how to delete missing source rows from your target database . Please correct me if my understanding is incorrect. I don't have cosmosDB provisioned to reproduce your exact scenario. However , you can refer to the below solution and try to implement.

    You can utilize Mapping dataflow inside Synapse workspace to achieve this requirement:
    1. Add two sources: One for the sourceFile and other for Target DB(CosmosDB in your case)
    2. Add Exist transformation and select Exist Type as 'Doesn't exist' and specify the PrimaryKey Column

    202085-image.png
    3. Add Alter Row Transformation and add condition as 'Delete if'='true()'

    201980-image.png
    4. Add the Sink as CosmosDB dataset.

    Please refer to the following video for more details: https://www.youtube.com/watch?v=9i7qf1vczUw

    Alternatively, you can try to load the entire source data into a temporary table / dummy table and perform delete from target DB based on data not present in the temp table.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.