Classify application workload in Azure Cosmos DB for PostgreSQL
Article 08/14/2024
2 contributors
Feedback
In this article
Prerequisites
Characteristics of multi-tenant SaaS
Characteristics of real-time operational analytics
Characteristics of high-throughput transactional
Next steps
APPLIES TO:
Azure Cosmos DB for PostgreSQL (powered by the Citus database
extension to PostgreSQL)
Here are common characteristics of the workloads that are the best fit for
Azure Cosmos DB for PostgreSQL.
This article assumes you know the fundamental concepts for
scaling . If you haven't read about
them, take a moment to do so.
Characteristics of multi-tenant SaaS
Tenants see their own data; they can't see other tenants' data.
Most B2B SaaS apps are multi-tenant. Examples include Salesforce or Shopify.
In most B2B SaaS apps, there are hundreds to tens of thousands of tenants, and
more tenants keep joining.
Multi-tenant SaaS apps are primarily operational/transactional, with single
digit millisecond latency requirements for their database queries.
These apps have a classic relational data model, and are built using ORMs –
like RoR, Hibernate, Django etc.
Characteristics of real-time operational analytics
These apps have a customer/user facing interactive analytics dashboard, with
a subsecond query latency requirement.
High concurrency required - at least 20 users.
Analyzes data that's fresh, within the last one second to few minutes.
Most have time series data such as events, logs, etc.
Common data models in these apps include:
Star Schema - few large/fact tables, the rest being small/dimension tables
Mostly fewer than 20 major tables
Characteristics of high-throughput transactional
Run NoSQL/document style workloads, but require PostgreSQL features such as
transactions, foreign/primary keys, triggers, extension like PostGIS, etc.
The workload is based on a single key. It has CRUD and lookups based on that
key.
These apps have high throughput requirements: thousands to hundreds of thousands of
TPS.
Query latency in single-digit milliseconds, with a high concurrency
requirement.
Time series data, such as internet of things.
Choose whichever fits your application the best: