Cassandra to Cosmos DB migration planning

Completed

To migrate to Azure smoothly, organizations need to plan carefully. One essential piece of information is an estimate of the workload that the migrated database might experience.

In your video camera manufacturer, you've made the decision to migration your Cassandra database to Azure and you've begun to plan the migration. You want to match the capacity of the migrated system to the load generated by users sharing and viewing their videos. You have detailed data describing that load over the last three years and you expect it to grow quickly over the next few months during two major new product launches.

Here, you'll learn how to estimate the size and throughput you need and how to create a database that satisfies those requirements.

Estimate data size

You need to calculate the requirements of your existing workload, before creating the Cosmos DB database.

To start with, note down the existing data size. If the migrated application will have more or less data, multiply average data size by number of rows. The value from either approach is the minimum database size of the new database.

Estimate existing throughput

Estimate the existing read rate, from query and get operations. Estimate the existing write rate, from insert, update, and delete operations.

Create the Cosmos DB database

Once you have the estimated throughput, you can create your Azure Cosmos DB account and tables.

Azure Cosmos DB can elastically scale storage and throughput. Because of this, the estimates are a starting point and storage and throughput can be altered at any time.

To estimate request units (RUs) and storage you can use this online calculator:

Estimate Request Units and Data Storage

Create the database

To create the database, use the following steps:

  1. Create a new resource in the Azure portal and choose a Cosmos DB account. Specify Cassandra as the API.
  2. Create a new virtual network during the creation process, or use a pre-existing virtual network, and configure the firewall to allow access.

Create the required tables

To create the tables, use CSQLSH, or create the tables in the Azure portal, in Data Explorer. Specify the estimated throughput in RUs at this point. To create the tables in Data Explorer, use the following steps:

  1. Once your Cosmos DB account is created, select your Cosmos DB account and click Data Explorer.
  2. In Data Explorer, click New Table.
  3. Specify a schema name for the Keyspace name.
  4. Specify a table name for tableid.
  5. Specify a list of columns for CREATE TABLE. For example,(customerid int, firstname text, lastname text, email text, stateprovince text, PRIMARY KEY ((stateprovince), customerid))
  6. Specify a Throughput.