Options to migrate your on-premises or cloud data to Azure Cosmos DB

APPLIES TO: NoSQL MongoDB Cassandra Gremlin Table

You can load data from various data sources to Azure Cosmos DB. Since Azure Cosmos DB supports multiple APIs, the targets can be any of the existing APIs. The following are some scenarios where you migrate data to Azure Cosmos DB:

  • Move data from one Azure Cosmos DB container to another container within the Azure Cosmos DB account (could be in the same database or a different database).
  • Move data from one Azure Cosmos DB account to another Azure Cosmos DB account (could be in the same region or a different region, same subscription or a different one).
  • Move data from a source such as Azure blob storage, a JSON file, Oracle database, Couchbase, DynamoDB to Azure Cosmos DB.

In order to support migration paths from the various sources to the different Azure Cosmos DB APIs, there are multiple solutions that provide specialized handling for each migration path. This document lists the available solutions and describes their advantages and limitations.

Factors affecting the choice of migration tool

The following factors determine the choice of the migration tool:

  • Online vs offline migration: Many migration tools provide a path to do a one-time migration only. This means that the applications accessing the database might experience a period of downtime. Some migration solutions provide a way to do a live migration where there's a replication pipeline set up between the source and the target.

  • Data source: The existing data can be in various data sources like Oracle DB2, Datastax Cassanda, Azure SQL Database, PostgreSQL, etc. The data can also be in an existing Azure Cosmos DB account and the intent of migration can be to change the data model or repartition the data in a container with a different partition key.

  • Azure Cosmos DB API: For the API for NoSQL in Azure Cosmos DB, there are a variety of tools developed by the Azure Cosmos DB team which aid in the different migration scenarios. All of the other APIs have their own specialized set of tools developed and maintained by the community. Since Azure Cosmos DB supports these APIs at a wire protocol level, these tools should work as-is while migrating data into Azure Cosmos DB too. However, they might require custom handling for throttles as this concept is specific to Azure Cosmos DB.

  • Size of data: Most migration tools work very well for smaller datasets. When the data set exceeds a few hundred gigabytes, the choices of migration tools are limited.

  • Expected migration duration: Migrations can be configured to take place at a slow, incremental pace that consumes less throughput or can consume the entire throughput provisioned on the target Azure Cosmos DB container and complete the migration in less time.

Azure Cosmos DB API for NoSQL

If you need help with capacity planning, consider reading our guide to estimating RU/s using Azure Cosmos DB capacity planner.

Migration type Solution Supported sources Supported targets Considerations
Offline Intra-account container copy Azure Cosmos DB for NoSQL Azure Cosmos DB for NoSQL • CLI-based; No set up needed.
• Supports large datasets.
Offline Azure Cosmos DB desktop data migration tool •Azure Cosmos DB for NoSQL
•Azure Cosmos DB for MongoDB
•Azure Cosmos DB for Table
•Azure Table storage
•JSON Files
•MongoDB
•SQL Server
•Azure Cosmos DB for NoSQL
•Azure Cosmos DB for MongoDB
•Azure Cosmos DB for Table
•Azure Table storage
•JSON Files
•MongoDB
•SQL Server
• Command-line tool
• Open-source
Offline Azure Data Factory •JSON/CSV Files
•Azure Cosmos DB for NoSQL
•Azure Cosmos DB for MongoDB
•MongoDB
•SQL Server
•Table Storage
•Azure Blob Storage

See the Azure Data Factory article for other supported sources.
•Azure Cosmos DB for NoSQL
•Azure Cosmos DB for MongoDB
•JSON Files

See the Azure Data Factory article for other supported targets.
• Easy to set up and supports multiple sources.
• Makes use of the Azure Cosmos DB bulk executor library.
• Suitable for large datasets.
• Lack of checkpointing - It means that if an issue occurs during the course of migration, you need to restart the whole migration process.
• Lack of a dead letter queue - It means that a few erroneous files can stop the entire migration process.
Offline Azure Cosmos DB Spark connector Azure Cosmos DB for NoSQL.

You can use other sources with additional connectors from the Spark ecosystem.
Azure Cosmos DB for NoSQL.

You can use other targets with additional connectors from the Spark ecosystem.
• Makes use of the Azure Cosmos DB bulk executor library.
• Suitable for large datasets.
• Needs a custom Spark setup.
• Spark is sensitive to schema inconsistencies and this can be a problem during migration.
Online Azure Cosmos DB Spark connector + Change Feed sample Azure Cosmos DB for NoSQL.

Uses Azure Cosmos DB Change Feed to stream all historic data as well as live updates.
Azure Cosmos DB for NoSQL.

You can use other targets with additional connectors from the Spark ecosystem.
• Makes use of the Azure Cosmos DB bulk executor library.
• Suitable for large datasets.
• Needs a custom Spark setup.
• Spark is sensitive to schema inconsistencies and this can be a problem during migration.
Offline Custom tool with Azure Cosmos DB bulk executor library The source depends on your custom code Azure Cosmos DB for NoSQL • Provides checkpointing, dead-lettering capabilities which increases migration resiliency.
• Suitable for very large datasets (10 TB+).
• Requires custom setup of this tool running as an App Service.
Online Azure Cosmos DB Functions + ChangeFeed API Azure Cosmos DB for NoSQL Azure Cosmos DB for NoSQL • Easy to set up.
• Works only if the source is an Azure Cosmos DB container.
• Not suitable for large datasets.
• Doesn't capture deletes from the source container.
Online Striim •Oracle
•Apache Cassandra

See the Striim website for other supported sources.
•Azure Cosmos DB for NoSQL
• Azure Cosmos DB for Cassandra

See the Striim website for other supported targets.
• Works with a large variety of sources like Oracle, DB2, SQL Server.
• Easy to build ETL pipelines and provides a dashboard for monitoring.
• Supports larger datasets.
• Since this is a third-party tool, it needs to be purchased from the marketplace and installed in the user's environment.

Azure Cosmos DB API for MongoDB

Follow the pre-migration guide to plan your migration.

When you're ready to migrate, you can find detailed guidance on migration tools below

Then, follow our post-migration guide to optimize your Azure Cosmos DB data estate once you've migrated.

A summary of migration pathways from your current solution to Azure Cosmso DB for MongoDB is provided below:

Migration type Solution Supported sources Supported targets Considerations
Offline Intra-account container copy Azure Cosmos DB for MongoDB Azure Cosmos DB for MongoDB • Command-line tool; No set up needed.
• Suitable for large datasets
Offline Azure Cosmos DB desktop data migration tool •Azure Cosmos DB for NoSQL
•Azure Cosmos DB for MongoDB
•Azure Cosmos DB for Table
•Azure Table storage
•JSON Files
•MongoDB
•SQL Server
•Azure Cosmos DB for NoSQL
•Azure Cosmos DB for MongoDB
•Azure Cosmos DB for Table
•Azure Table storage
•JSON Files
•MongoDB
•SQL Server
• Command-line tool
• Open-source
Online Azure Database Migration Service MongoDB Azure Cosmos DB for MongoDB • Makes use of the Azure Cosmos DB bulk executor library.
• Suitable for large datasets and takes care of replicating live changes.
• Works only with other MongoDB sources.
Offline Azure Database Migration Service MongoDB Azure Cosmos DB for MongoDB • Makes use of the Azure Cosmos DB bulk executor library.
• Suitable for large datasets and takes care of replicating live changes.
• Works only with other MongoDB sources.
Offline Azure Data Factory •JSON/CSV Files
•Azure Cosmos DB for NoSQL
•Azure Cosmos DB for MongoDB
•MongoDB
•SQL Server
•Table Storage
•Azure Blob Storage

See the Azure Data Factory article for other supported sources.
•Azure Cosmos DB for NoSQL
•Azure Cosmos DB for MongoDB
• JSON files

See the Azure Data Factory article for other supported targets.
• Easy to set up and supports multiple sources.
• Makes use of the Azure Cosmos DB bulk executor library.
• Suitable for large datasets.
• Lack of checkpointing means that any issue during the course of migration would require a restart of the whole migration process.
• Lack of a dead letter queue would mean that a few erroneous files could stop the entire migration process.
• Needs custom code to increase read throughput for certain data sources.
Offline Existing Mongo Tools (mongodump, mongorestore, Studio3T) •MongoDB
•Azure Cosmos DB for MongoDB
Azure Cosmos DB for MongoDB • Easy to set up and integration.
• Needs custom handling for throttles.

Azure Cosmos DB API for Cassandra

If you need help with capacity planning, consider reading our guide to estimating RU/s using Azure Cosmos DB capacity planner.

Migration type Solution Supported sources Supported targets Considerations
Offline Intra-account container copy Azure Cosmos DB API for Cassandra Azure Cosmos DB API for Cassandra • CLI-based; No set up needed.
• Supports large datasets.
Offline cqlsh COPY command CSV Files Azure Cosmos DB API for Cassandra • Easy to set up.
• Not suitable for large datasets.
• Works only when the source is a Cassandra table.
Offline Copy table with Spark •Apache Cassandra
Azure Cosmos DB API for Cassandra • Can make use of Spark capabilities to parallelize transformation and ingestion.
• Needs configuration with a custom retry policy to handle throttles.
Online Dual-write proxy + Spark •Apache Cassandra
•Azure Cosmos DB API for Cassandra
• Supports larger datasets, but careful attention required for setup and validation.
• Open-source tools, no purchase required.
Online Striim (from Oracle DB/Apache Cassandra) •Oracle
•Apache Cassandra

See the Striim website for other supported sources.
•Azure Cosmos DB API for NoSQL
•Azure Cosmos DB API for Cassandra

See the Striim website for other supported targets.
• Works with a large variety of sources like Oracle, DB2, SQL Server.
• Easy to build ETL pipelines and provides a dashboard for monitoring.
• Supports larger datasets.
• Since this is a third-party tool, it needs to be purchased from the marketplace and installed in the user's environment.
Online Arcion (from Oracle DB/Apache Cassandra) •Oracle
•Apache Cassandra

See the Arcion website for other supported sources.
Azure Cosmos DB API for Cassandra.

See the Arcion website for other supported targets.
• Supports larger datasets.
• Since this is a third-party tool, it needs to be purchased from the marketplace and installed in the user's environment.

Other APIs

For APIs other than the API for NoSQL, API for MongoDB and the API for Cassandra, there are various tools supported by each of the API's existing ecosystems.

API for Gremlin

API for Table

Next steps

  • Trying to do capacity planning for a migration to Azure Cosmos DB?
  • Learn more by trying out the sample applications consuming the bulk executor library in .NET and Java.
  • The bulk executor library is integrated into the Azure Cosmos DB Spark connector, to learn more, see Azure Cosmos DB Spark connector article.
  • Contact the Azure Cosmos DB product team by opening a support ticket under the "General Advisory" problem type and "Large (TB+) migrations" problem subtype for additional help with large scale migrations.