Need help in throughput Configuration in Cosmos DB

Shreekumar Suggamad 136 Reputation points

We are planning to switch from Azure SQL Server to CosmosDB. We are reading around 27-30 Million data every day for processing. Here's how we wanted to execute the things.

  • Reading data from Kafka and storing it in CosmosDB throughout the day
  • Read data from Cosmos, perform some arithmetic calculations and save the calculated data back in different containers.

Basically, we have 2 types of JSON files (reading from Kafka)

  1. Json1 - size is 70B ( Kafka sends this throughout the day. 1-6 times per day)
  2. Json2 - size is 1KB ( Kafka sends this once per day )

We need help in understanding the required throughput & throughput mode selection for this scenario.

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,433 questions
{count} vote

Accepted answer
  1. GeethaThatipatri-MSFT 27,002 Reputation points Microsoft Employee

    Hi, @Shreekumar Suggamad Thanks for the question and for using the MS Q&A platform.
    Looks like you have asked the same question in the StackOverflow here and which is addressed by Mark Brown

    • First, you need to work out approx. how many reads and writes per second will be processed and stored in Cosmos DB at given times of the day (request units are the “base currency” of Cosmos DB – can’t even begin sizing without some idea of this).
    • You also need to know what your data retention is going to be once any historic data has been migrated (for storage costs).
    • Once you have these figures, you can start to plug those numbers into our capacity calculator to give a reasonable estimate:
    • You can also consult this article for deciding between the standard and autoscale “throughput modes”:
    • Regarding Kafka – exactly how is this being used?
    o If being used for event sourcing between Azure SQL DB-backed microservices (or similar), would recommend using the change feed in Cosmos DB directly (see patterns here).
    o If messages are coming from an external source through Kafka, you will want to check out the Kafka connector documentation.

    Please let me know if you need any additional information.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful