How to duplicate / clone a Cosmos DB?

Jason Frank 31 Reputation points
2021-04-22T21:41:49.323+00:00

My goal is to create a duplicate of my entire production Cosmos DB in order to create a staging database.

After that initial copy, I would want to modify the staging database without worrying about it being in-sync with the production version (thus I don't think that the "Live Data Migrator" is applicable to me). Then from time-to-time I'd want to "refresh" the staging database to be a fresh copy of the production version. But right now I'm not too concerned about that.

At this point I'm just getting started and do not have anything substantial for the database. So I really was hoping there would just be a way to duplicate the Cosmos DB in the portal. But I can't find a way.

I've also tried the "Data Migration Tool" but that failed with the error message saying that it does not work for serverless accounts.

Also note that I just want to copy the entire Cosmos database. So working at the individual collection/container level is not as convenient.

Finally, I think we may need to talk about this in terms of a Cosmos DB account, rather than just a database within a DB account. I say that because from what I can tell, you can only get a different database connection string at the DB account level, and not at the database level (within a DB account). And I think I'll need to be able to have distinct connect strings in order for me to work with a separate staging database in my code.

How can I achieve this duplication for a Cosmos DB simply?

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,902 questions
{count} votes

Accepted answer
  1. Mark Brown - MSFT 2,771 Reputation points Microsoft Employee
    2021-04-23T16:07:59.097+00:00

    Hi Jason. I'm going to paste the answers I sent you on Twitter here and also answer your follow up questions. Better to reply here as this may help others.

    The simplest way to do this is with Change feed. Live Data Migrator is certainly an option but in my opinion this may be a bit of overkill as it is intended for live migration for production workloads.

    You first will need to write scripts using PowerShell, Azure CLI or use an ARM template to provision your Cosmos resources in your staging environment.

    Once provisioned you would write a change feed routine in a console application using the Change Feed Processor with the start time set far enough back that it copies everything in your production container to your staging container. Be sure the Cosmos client connects using bulk mode to better saturate throughput on writes. Once it is caught up you can then stop the console app.

    Depending what you're doing with the data in your staging deployment you could either turn on the change feed again to copy any new or updated data from your prod into stage or if you're destroying data in staging, you'd need to drop and recreate the container again with your scripts and rehydrate with change feed again. You can keep the account and database resources though so you will not need to modify your connection strings in your test apps.

    Other follow up questions:

    Q: What's the difference between Container == Collection.
    A: They are synonymous to us.

    Q: Can I use continuous backup to do this.
    A: Yes, that is certainly a possibility and easier than what I've described. However there are a fair number of limitations on continuous backup so you may want to explore before deciding upon that as an option. You also will have to modify your connection information for any clients used for testing in your staging environment. This may or may not be an issue for you.

    2 people found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.