Useful diagnostic queries in Azure Cosmos DB for PostgreSQL
APPLIES TO: Azure Cosmos DB for PostgreSQL (powered by the Citus database extension to PostgreSQL)
Finding which node contains data for a specific tenant
In the multitenant use case, we can determine which worker node contains the rows for a specific tenant. Azure Cosmos DB for PostgreSQL groups the rows of distributed tables into shards, and places each shard on a worker node in the cluster.
Suppose our application's tenants are stores, and we want to find which worker node holds the data for store ID=4. In other words, we want to find the placement for the shard containing rows whose distribution column has value 4:
SELECT shardid, shardstate, shardlength, nodename, nodeport, placementid
FROM pg_dist_placement AS placement,
pg_dist_node AS node
WHERE placement.groupid = node.groupid
AND node.noderole = 'primary'
AND shardid = (
SELECT get_shard_id_for_distribution_column('stores', 4)
);
The output contains the host and port of the worker database.
┌─────────┬────────────┬─────────────┬───────────┬──────────┬─────────────┐
│ shardid │ shardstate │ shardlength │ nodename │ nodeport │ placementid │
├─────────┼────────────┼─────────────┼───────────┼──────────┼─────────────┤
│ 102009 │ 1 │ 0 │ 10.0.0.16 │ 5432 │ 2 │
└─────────┴────────────┴─────────────┴───────────┴──────────┴─────────────┘
Finding which node hosts a distributed schema
Distributed schemas are automatically associated with individual colocation groups such that the tables created in those schemas are converted to colocated distributed tables without a shard key. You can find where a distributed schema resides by joining citus_shards
with citus_schemas
:
select schema_name, nodename, nodeport
from citus_shards
join citus_schemas cs
on cs.colocation_id = citus_shards.colocation_id
group by 1,2,3;
schema_name | nodename | nodeport
-------------+-----------+----------
a | localhost | 9701
b | localhost | 9702
with_data | localhost | 9702
You can also query citus_shards
directly filtering down to schema table type to have a detailed listing for all tables.
select * from citus_shards where citus_table_type = 'schema';
table_name | shardid | shard_name | citus_table_type | colocation_id | nodename | nodeport | shard_size | schema_name | colocation_id | schema_size | schema_owner
----------------+---------+-----------------------+------------------+---------------+-----------+----------+------------+-------------+---------------+-------------+--------------
a.cities | 102080 | a.cities_102080 | schema | 4 | localhost | 9701 | 8192 | a | 4 | 128 kB | citus
a.map_tags | 102145 | a.map_tags_102145 | schema | 4 | localhost | 9701 | 32768 | a | 4 | 128 kB | citus
a.measurement | 102047 | a.measurement_102047 | schema | 4 | localhost | 9701 | 0 | a | 4 | 128 kB | citus
a.my_table | 102179 | a.my_table_102179 | schema | 4 | localhost | 9701 | 16384 | a | 4 | 128 kB | citus
a.people | 102013 | a.people_102013 | schema | 4 | localhost | 9701 | 32768 | a | 4 | 128 kB | citus
a.test | 102008 | a.test_102008 | schema | 4 | localhost | 9701 | 8192 | a | 4 | 128 kB | citus
a.widgets | 102146 | a.widgets_102146 | schema | 4 | localhost | 9701 | 32768 | a | 4 | 128 kB | citus
b.test | 102009 | b.test_102009 | schema | 5 | localhost | 9702 | 8192 | b | 5 | 32 kB | citus
b.test_col | 102012 | b.test_col_102012 | schema | 5 | localhost | 9702 | 24576 | b | 5 | 32 kB | citus
with_data.test | 102180 | with_data.test_102180 | schema | 11 | localhost | 9702 | 647168 | with_data | 11 | 632 kB | citus
Finding the distribution column for a table
Each distributed table has a "distribution column." (For more information, see Distributed Data Modeling.) It can be important to know which column it is. For instance, when joining or filtering tables, you might see error messages with hints like, "add a filter to the distribution column."
The pg_dist_*
tables on the coordinator node contain diverse metadata about
the distributed database. In particular pg_dist_partition
holds information
about the distribution column for each table. You can use a convenient utility
function to look up the distribution column name from the low-level details in
the metadata. Here's an example and its output:
-- create example table
CREATE TABLE products (
store_id bigint,
product_id bigint,
name text,
price money,
CONSTRAINT products_pkey PRIMARY KEY (store_id, product_id)
);
-- pick store_id as distribution column
SELECT create_distributed_table('products', 'store_id');
-- get distribution column name for products table
SELECT column_to_column_name(logicalrelid, partkey) AS dist_col_name
FROM pg_dist_partition
WHERE logicalrelid='products'::regclass;
Example output:
┌───────────────┐
│ dist_col_name │
├───────────────┤
│ store_id │
└───────────────┘
Detecting locks
This query will run across all worker nodes and identify locks, how long they've been open, and the offending queries:
SELECT run_command_on_workers($cmd$
SELECT array_agg(
blocked_statement || ' $ ' || cur_stmt_blocking_proc
|| ' $ ' || cnt::text || ' $ ' || age
)
FROM (
SELECT blocked_activity.query AS blocked_statement,
blocking_activity.query AS cur_stmt_blocking_proc,
count(*) AS cnt,
age(now(), min(blocked_activity.query_start)) AS "age"
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity
ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.GRANTED
AND blocking_locks.GRANTED
GROUP BY blocked_activity.query,
blocking_activity.query
ORDER BY 4
) a
$cmd$);
Example output:
┌───────────────────────────────────────────────────────────────────────────────────┐
│ run_command_on_workers │
├───────────────────────────────────────────────────────────────────────────────────┤
│ (10.0.0.16,5432,t,"") │
│ (10.0.0.20,5432,t,"{""update ads_102277 set name = 'new name' where id = 1; $ sel…│
│…ect * from ads_102277 where id = 1 for update; $ 1 $ 00:00:03.729519""}") │
└───────────────────────────────────────────────────────────────────────────────────┘
Querying the size of your shards
This query will provide you with the size of every shard of a given
distributed table, called my_distributed_table
:
SELECT *
FROM run_command_on_shards('my_distributed_table', $cmd$
SELECT json_build_object(
'shard_name', '%1$s',
'size', pg_size_pretty(pg_table_size('%1$s'))
);
$cmd$);
Example output:
┌─────────┬─────────┬───────────────────────────────────────────────────────────────────────┐
│ shardid │ success │ result │
├─────────┼─────────┼───────────────────────────────────────────────────────────────────────┤
│ 102008 │ t │ {"shard_name" : "my_distributed_table_102008", "size" : "2416 kB"} │
│ 102009 │ t │ {"shard_name" : "my_distributed_table_102009", "size" : "3960 kB"} │
│ 102010 │ t │ {"shard_name" : "my_distributed_table_102010", "size" : "1624 kB"} │
│ 102011 │ t │ {"shard_name" : "my_distributed_table_102011", "size" : "4792 kB"} │
└─────────┴─────────┴───────────────────────────────────────────────────────────────────────┘
Querying the size of all distributed tables
This query gets a list of the sizes for each distributed table plus the size of their indices.
SELECT
tablename,
pg_size_pretty(
citus_total_relation_size(tablename::text)
) AS total_size
FROM pg_tables pt
JOIN pg_dist_partition pp
ON pt.tablename = pp.logicalrelid::text
WHERE schemaname = 'public';
Example output:
┌───────────────┬────────────┐
│ tablename │ total_size │
├───────────────┼────────────┤
│ github_users │ 39 MB │
│ github_events │ 98 MB │
└───────────────┴────────────┘
Note there are other Azure Cosmos DB for PostgreSQL functions for querying distributed table size, see determining table size.
Identifying unused indices
The following query will identify unused indexes on worker nodes for a given
distributed table (my_distributed_table
)
SELECT *
FROM run_command_on_shards('my_distributed_table', $cmd$
SELECT array_agg(a) as infos
FROM (
SELECT (
schemaname || '.' || relname || '##' || indexrelname || '##'
|| pg_size_pretty(pg_relation_size(i.indexrelid))::text
|| '##' || idx_scan::text
) AS a
FROM pg_stat_user_indexes ui
JOIN pg_index i
ON ui.indexrelid = i.indexrelid
WHERE NOT indisunique
AND idx_scan < 50
AND pg_relation_size(relid) > 5 * 8192
AND (schemaname || '.' || relname)::regclass = '%s'::regclass
ORDER BY
pg_relation_size(i.indexrelid) / NULLIF(idx_scan, 0) DESC nulls first,
pg_relation_size(i.indexrelid) DESC
) sub
$cmd$);
Example output:
┌─────────┬─────────┬───────────────────────────────────────────────────────────────────────┐
│ shardid │ success │ result │
├─────────┼─────────┼───────────────────────────────────────────────────────────────────────┤
│ 102008 │ t │ │
│ 102009 │ t │ {"public.my_distributed_table_102009##some_index_102009##28 MB##0"} │
│ 102010 │ t │ │
│ 102011 │ t │ │
└─────────┴─────────┴───────────────────────────────────────────────────────────────────────┘
Monitoring client connection count
The following query counts the connections open on the coordinator, and groups them by type.
SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state;
Example output:
┌────────┬───────┐
│ state │ count │
├────────┼───────┤
│ active │ 3 │
│ idle │ 3 │
│ ∅ │ 6 │
└────────┴───────┘
Viewing system queries
Active queries
The pg_stat_activity
view shows which queries are currently executing. You
can filter to find the actively executing ones, along with the process ID of
their backend:
SELECT pid, query, state
FROM pg_stat_activity
WHERE state != 'idle';
Why are queries waiting
We can also query to see the most common reasons that non-idle queries that are waiting. For an explanation of the reasons, check the PostgreSQL documentation.
SELECT wait_event || ':' || wait_event_type AS type, count(*) AS number_of_occurences
FROM pg_stat_activity
WHERE state != 'idle'
GROUP BY wait_event, wait_event_type
ORDER BY number_of_occurences DESC;
Example output when running pg_sleep
in a separate query concurrently:
┌─────────────────┬──────────────────────┐
│ type │ number_of_occurences │
├─────────────────┼──────────────────────┤
│ ∅ │ 1 │
│ PgSleep:Timeout │ 1 │
└─────────────────┴──────────────────────┘
Index hit rate
This query will provide you with your index hit rate across all nodes. Index hit rate is useful in determining how often indices are used when querying. A value of 95% or higher is ideal.
-- on coordinator
SELECT 100 * (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit) AS index_hit_rate
FROM pg_statio_user_indexes;
-- on workers
SELECT nodename, result as index_hit_rate
FROM run_command_on_workers($cmd$
SELECT 100 * (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit) AS index_hit_rate
FROM pg_statio_user_indexes;
$cmd$);
Example output:
┌───────────┬────────────────┐
│ nodename │ index_hit_rate │
├───────────┼────────────────┤
│ 10.0.0.16 │ 96.0 │
│ 10.0.0.20 │ 98.0 │
└───────────┴────────────────┘
Cache hit rate
Most applications typically access a small fraction of their total data at once. PostgreSQL keeps frequently accessed data in memory to avoid slow reads from disk. You can see statistics about it in the pg_statio_user_tables view.
An important measurement is what percentage of data comes from the memory cache vs the disk in your workload:
-- on coordinator
SELECT
sum(heap_blks_read) AS heap_read,
sum(heap_blks_hit) AS heap_hit,
100 * sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS cache_hit_rate
FROM
pg_statio_user_tables;
-- on workers
SELECT nodename, result as cache_hit_rate
FROM run_command_on_workers($cmd$
SELECT
100 * sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS cache_hit_rate
FROM
pg_statio_user_tables;
$cmd$);
Example output:
┌───────────┬──────────┬─────────────────────┐
│ heap_read │ heap_hit │ cache_hit_rate │
├───────────┼──────────┼─────────────────────┤
│ 1 │ 132 │ 99.2481203007518796 │
└───────────┴──────────┴─────────────────────┘
If you find yourself with a ratio significantly lower than 99%, then you likely want to consider increasing the cache available to your database.
Next steps
- Learn about other system tables that are useful for diagnostics