Performance tuning in Azure Cosmos DB for PostgreSQL

APPLIES TO: Azure Cosmos DB for PostgreSQL (powered by the Citus database extension to PostgreSQL)

Running a distributed database at its full potential offers high performance. However, reaching that performance can take some adjustments in application code and data modeling. This article covers some of the most common--and effective--techniques to improve performance.

Client-side connection pooling

A connection pool holds open database connections for reuse. An application requests a connection from the pool when needed, and the pool returns one that is already established if possible, or establishes a new one. When done, the application releases the connection back to the pool rather than closing it.

Adding a client-side connection pool is an easy way to boost application performance with minimal code changes. In our measurements, running single-row insert statements goes about 24x faster on a cluster with pooling enabled.

For language-specific examples of adding pooling in application code, see the app stacks guide.


Azure Cosmos DB for PostgreSQL also provides server-side connection pooling using pgbouncer, but it mainly serves to increase the client connection limit. An individual application's performance benefits more from client- rather than server-side pooling. (Although both forms of pooling can be used at once without harm.)

Scoping distributed queries


When updating a distributed table, try to filter queries on the distribution column--at least when it makes sense, when the new filters don't change the meaning of the query.

In some workloads, it's easy. Transactional/operational workloads like multi-tenant SaaS apps or the Internet of Things distribute tables by tenant or device. Queries are scoped to a tenant- or device-ID.

For instance, in our multi-tenant tutorial we have an ads table distributed by company_id. The naive way to update an ad is to single it out like this:

-- slow

   SET impressions_count = impressions_count+1
 WHERE id = 42; -- missing filter on distribution column

Although the query uniquely identifies a row and updates it, Azure Cosmos DB for PostgreSQL doesn't know, at planning time, which shard the query will update. The Citus extension takes a ShareUpdateExclusiveLock on all shards to be safe, which blocks other queries trying to update the table.

Even though the id was sufficient to identify a row, we can include an extra filter to make the query faster:

-- fast

   SET impressions_count = impressions_count+1
 WHERE id = 42
   AND company_id = 1; -- the distribution column

The Azure Cosmos DB for PostgreSQL query planner sees a direct filter on the distribution column and knows exactly which single shard to lock. In our tests, adding filters for the distribution column increased parallel update performance by 100x.

Joins and CTEs

We've seen how UPDATE statements should scope by the distribution column to avoid unnecessary shard locks. Other queries benefit from scoping too, usually to avoid the network overhead of unnecessarily shuffling data between worker nodes.

-- logically correct, but slow

WITH single_ad AS (
    FROM ads
   WHERE id=1
  FROM single_ad s
  JOIN campaigns c ON (;

We can speed up the query up by filtering on the distribution column, company_id, in the CTE and main SELECT statement.

-- faster, joining on distribution column

WITH single_ad AS (
    FROM ads
   WHERE id=1 and company_id=1
  FROM single_ad s
  JOIN campaigns c ON (
 WHERE s.company_id=1 AND c.company_id = 1;

In general, when joining distributed tables, try to include the distribution column in the join conditions. However, when joining between a distributed and reference table it's not required, because reference table contents are replicated across all worker nodes.

If it seems inconvenient to add the extra filters to all your queries, keep in mind there are helper libraries for several popular application frameworks that make it easier. Here are instructions:

Efficient database logging

Logging all SQL statements all the time adds overhead. In our measurements, using more a judicious log level improved the transactions per second by 10x vs full logging.

For efficient everyday operation, you can disable logging except for errors and abnormally long-running queries:

setting value reason
log_statement_stats OFF Avoid profiling overhead
log_duration OFF Don't need to know the duration of normal queries
log_statement NONE Don't log queries without a more specific reason
log_min_duration_statement A value longer than what you think normal queries should take Shows the abnormally long queries


The log-related settings in our managed service take the above recommendations into account. You can leave them as they are. However, we've sometimes seen customers change the settings to make logging aggressive, which has led to performance issues.

Lock contention

The database uses locks to keep data consistent under concurrent access. However, some query patterns require an excessive amount of locking, and faster alternatives exist.

System health and locks

Before diving into common locking inefficiencies, let's see how to view locks and activity throughout the database cluster. The citus_stat_activity view gives a detailed view.

The view shows, among other things, how queries are blocked by "wait events," including locks. Grouping by wait_event_type paints a picture of system health:

-- general system health

SELECT wait_event_type, count(*)
  FROM citus_stat_activity
 WHERE state != 'idle'

A NULL wait_event_type means the query isn't waiting on anything.

If you do see locks in the stat activity output, you can view the specific blocked queries using citus_lock_waits:

SELECT * FROM citus_lock_waits;

For example, if one query is blocked on another trying to update the same row, you'll see the blocked and blocking statements appear:

-[ RECORD 1 ]-------------------------+--------------------------------------
waiting_gpid                          | 10000011981
blocking_gpid                         | 10000011979
blocked_statement                     | UPDATE numbers SET j = 3 WHERE i = 1;
current_statement_in_blocking_process | UPDATE numbers SET j = 2 WHERE i = 1;
waiting_nodeid                        | 1
blocking_nodeid                       | 1

To see not only the locks happening at the moment, but historical patterns, you can capture locks in the PostgreSQL logs. To learn more, see the log_lock_waits server setting in the PostgreSQL documentation. Another great resource is seven tips for dealing with locks on the Citus Data Blog.

Common problems and solutions

DDL commands

DDL Commands like truncate, drop, and create index all take write locks, and block writes on the entire table. Minimizing such operations reduces locking issues.


  • Try to consolidate DDL into maintenance windows, or use them less often.

  • PostgreSQL supports building indices concurrently, to avoid taking a write lock on the table.

  • Consider setting lock_timeout in a SQL session prior to running a heavy DDL command. With lock_timeout, PostgreSQL will abort the DDL command if the command waits too long for a write lock. A DDL command waiting for a lock can cause later queries to queue behind itself.

Idle in transaction connections

Idle (uncommitted) transactions sometimes block other queries unnecessarily. For example:


UPDATE ... ;

-- Suppose the client waits now and doesn't COMMIT right away.
-- Other queries that want to update the same rows will be blocked.

COMMIT; -- finally!

To manually clean up any long-idle queries on the coordinator node, you can run a command like this:

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = 'citus'
 AND pid <> pg_backend_pid()
 AND state in ('idle in transaction')
 AND state_change < current_timestamp - INTERVAL '15' MINUTE;

PostgreSQL also offers an idle_in_transaction_session_timeout setting to automate idle session termination.


Azure Cosmos DB for PostgreSQL detects distributed deadlocks and cancels their queries, but the situation is less performant than avoiding deadlocks in the first place. A common source of deadlocks comes from updating the same set of rows in a different order from multiple transactions at once.

For instance, running these transactions in parallel:

Session A:

UPDATE ads SET updated_at = now() WHERE id = 1 AND company_id = 1;
UPDATE ads SET updated_at = now() WHERE id = 2 AND company_id = 1;

Session B:

UPDATE ads SET updated_at = now() WHERE id = 2 AND company_id = 1;
UPDATE ads SET updated_at = now() WHERE id = 1 AND company_id = 1;

-- ERROR:  canceling the transaction since it was involved in a distributed deadlock

Session A updated ID 1 then 2, whereas the session B updated 2 then 1. Write SQL code for transactions carefully to update rows in the same order. (The update order is sometimes called a "locking hierarchy.")

In our measurement, bulk updating a set of rows with many transactions went 3x faster when avoiding deadlock.

I/O during ingestion

I/O bottlenecking is typically less of a problem for Azure Cosmos DB for PostgreSQL than for single-node PostgreSQL because of sharding. The shards are individually smaller tables, with better index and cache hit rates, yielding better performance.

However, even with Azure Cosmos DB for PostgreSQL, as tables and indices grow larger, disk I/O can become a problem for data ingestion. Things to look out for are an increasing number of 'IO' wait_event_type entries appearing in citus_stat_activity:

SELECT wait_event_type, wait_event count(*)
  FROM citus_stat_activity
 WHERE state='active'
 GROUP BY 1,2;

Run the above query repeatedly to capture wait event related information. Note how the counts of different wait event types change.

Also look at metrics in the Azure portal, particularly the IOPS metric maxing out.


  • If your data is naturally ordered, such as in a time series, use PostgreSQL table partitioning. See this guide to learn how to partition distributed tables.

  • Remove unused indices. Index maintenance causes I/O amplification during ingestion. To find which indices are unused, use this query.

  • If possible, avoid indexing randomized data. For instance, some UUID generation algorithms follow no order. Indexing such a value causes a lot overhead. Try a bigint sequence instead, or monotonically increasing UUIDs.

Summary of results

In benchmarks of simple ingestion with INSERTs, UPDATEs, transaction blocks, we observed the following query speedups for the techniques in this article.

Technique Query speedup
Scoping queries 100x
Connection pooling 24x
Efficient logging 10x
Avoiding deadlock 3x

Next steps