Migrate data from Cassandra to Azure Cosmos DB for Apache Cassandra account using Arcion
APPLIES TO: Cassandra
API for Cassandra in Azure Cosmos DB has become a great choice for enterprise workloads running on Apache Cassandra for many reasons such as:
No overhead of managing and monitoring: It eliminates the overhead of managing and monitoring a myriad of settings across OS, JVM, and yaml files and their interactions.
Significant cost savings: You can save cost with Azure Cosmos DB, which includes the cost of VM’s, bandwidth, and any applicable licenses. Additionally, you don’t have to manage the data centers, servers, SSD storage, networking, and electricity costs.
Ability to use existing code and tools: Azure Cosmos DB provides wire protocol level compatibility with existing Cassandra SDKs and tools. This compatibility ensures you can use your existing codebase with Azure Cosmos DB for Apache Cassandra with trivial changes.
There are various ways to migrate database workloads from one platform to another. Arcion is a tool that offers a secure and reliable way to perform zero downtime migration from other databases to Azure Cosmos DB. This article describes the steps required to migrate data from Apache Cassandra database to Azure Cosmos DB for Apache Cassandra using Arcion.
This offering from Arcion is currently in beta. For more information, please contact them at Arcion Support
Benefits using Arcion for migration
Arcion’s migration solution follows a step by step approach to migrate complex operational workloads. The following are some of the key aspects of Arcion’s zero-downtime migration plan:
It offers automatic migration of business logic (tables, indexes, views) from Apache Cassandra database to Azure Cosmos DB. You don’t have to create schemas manually.
Arcion offers high-volume and parallel database replication. It enables both the source and target platforms to be in-sync during the migration by using a technique called Change-Data-Capture (CDC). By using CDC, Arcion continuously pulls a stream of changes from the source database(Apache Cassandra) and applies it to the destination database(Azure Cosmos DB).
It's fault-tolerant and provides exactly once delivery of data even during a hardware or software failure in the system.
It secures the data during transit using security methodologies like TLS, encryption.
Steps to migrate data
This section describes the steps required to set up Arcion and migrates data from Apache Cassandra database to Azure Cosmos DB.
From the computer where you plan to install the Arcion replicant, add a security certificate. This certificate is required by the Arcion replicant to establish a TLS connection with the specified Azure Cosmos DB account. You can add the certificate with the following steps:
wget https://cacert.omniroot.com/bc2025.crt mv bc2025.crt bc2025.cer keytool -keystore $JAVA_HOME/lib/security/cacerts -importcert -alias bc2025ca -file bc2025.cer
From the CLI terminal, set up the source database configuration. Open the configuration file using
vi conf/conn/cassandra.ymlcommand and add a comma-separated list of IP addresses of the Cassandra nodes, port number, username, password, and any other required details. The following is an example of contents in the configuration file:
type: CASSANDRA host: 172.17.0.2 port: 9042 username: 'cassandra' password: 'cassandra' max-connections: 30
After filling out the configuration details, save and close the file.
Optionally, you can set up the source database filter file. The filter file specifies which schemas or tables to migrate. Open the configuration file using
vi filter/cassandra_filter.ymlcommand and enter the following configuration details:
allow: - schema: “io_arcion” Types: [TABLE]
After filling out the database filter details, save and close the file.
Next you will set up the destination database configuration. Before you define the configuration, create an Azure Cosmos DB for Apache Cassandra account and then create a Keyspace, and a table to store the migrated data. Because you're migrating from Apache Cassandra to API for Cassandra in Azure Cosmos DB, you can use the same partition key that you've used with Apache cassandra.
Before migrating the data, increase the container throughput to the amount required for your application to migrate quickly. For example, you can increase the throughput to 100000 RUs. Scaling the throughput before starting the migration will help you to migrate your data in less time.
Decrease the throughput after the migration is complete. Based on the amount of data stored and RUs required for each operation, you can estimate the throughput required after data migration. To learn more on how to estimate the RUs required, see Provision throughput on containers and databases and Estimate RU/s using the Azure Cosmos DB capacity planner articles.
Get the Contact Point, Port, Username, and Primary Password of your Azure Cosmos DB account from the Connection String pane. You'll use these values in the configuration file.
From the CLI terminal, set up the destination database configuration. Open the configuration file using
vi conf/conn/cosmosdb.ymlcommand and add a comma-separated list of host URI, port number, username, password, and other required parameters. The following example shows the contents of the configuration file:
type: COSMOSDB host: '<Azure Cosmos DB account’s Contact point>' port: 10350 username: 'arciondemo' password: '<Your Azure Cosmos DB account’s primary password>' max-connections: 30
Next migrate the data using Arcion. You can run the Arcion replicant in full or snapshot mode:
Full mode – In this mode, the replicant continues to run after migration and it listens for any changes on the source Apache Cassandra system. If it detects any changes, they're replicated on the target Azure Cosmos DB account in real time.
Snapshot mode – In this mode, you can perform schema migration and one-time data replication. Real-time replication isn’t supported with this option.
By using the above two modes, migration can be performed with zero downtime.
To migrate data, from the Arcion replicant CLI terminal, run the following command:
./bin/replicant full conf/conn/cassandra.yaml conf/conn/cosmosdb.yaml --filter filter/cassandra_filter.yaml --replace-existing
The replicant UI shows the replication progress. Once the schema migration and snapshot operation are done, the progress shows 100%. After the migration is complete, you can validate the data on the target Azure Cosmos DB database.
Because you've used full mode for migration, you can perform operations such as insert, update, or delete data on the source Apache Cassandra database. Later validate that they're replicated real time on the target Azure Cosmos DB database. After the migration, make sure to decrease the throughput configured for your Azure Cosmos DB container.
You can stop the replicant any point and restart it with --resume switch. The replication resumes from the point it has stopped without compromising on data consistency. The following command shows how to use the resume switch.
./bin/replicant full conf/conn/cassandra.yaml conf/conn/cosmosdb.yaml --filter filter/cassandra_filter.yaml --replace-existing --resume
To learn more on the data migration to destination, real-time migration, see the Arcion replicant demo.