Need help in throughput Configuration in Cosmos DB

Question

We are planning to switch from Azure SQL Server to CosmosDB. We are reading around 27-30 Million data every day for processing. Here's how we wanted to execute the things.

Reading data from Kafka and storing it in CosmosDB throughout the day
Read data from Cosmos, perform some arithmetic calculations and save the calculated data back in different containers.

Basically, we have 2 types of JSON files (reading from Kafka)

Json1 - size is 70B ( Kafka sends this throughout the day. 1-6 times per day)
Json2 - size is 1KB ( Kafka sends this once per day )

We need help in understanding the required throughput & throughput mode selection for this scenario.

Accepted Answer

Hi, @Shreekumar Suggamad Thanks for the question and for using the MS Q&A platform.
Looks like you have asked the same question in the StackOverflow here and which is addressed by Mark Brown

• First, you need to work out approx. how many reads and writes per second will be processed and stored in Cosmos DB at given times of the day (request units are the “base currency” of Cosmos DB – can’t even begin sizing without some idea of this).
• You also need to know what your data retention is going to be once any historic data has been migrated (for storage costs).
• Once you have these figures, you can start to plug those numbers into our capacity calculator to give a reasonable estimate:
• You can also consult this article for deciding between the standard and autoscale “throughput modes”: https://learn.microsoft.com/azure/cosmos-db/how-to-choose-offer
• Regarding Kafka – exactly how is this being used?
o If being used for event sourcing between Azure SQL DB-backed microservices (or similar), would recommend using the change feed in Cosmos DB directly (see patterns here).
o If messages are coming from an external source through Kafka, you will want to check out the Kafka connector documentation.

Please let me know if you need any additional information.
Regards
Geetha

Share via

Need help in throughput Configuration in Cosmos DB

0 additional answers