Azure Cosmos DB for PostgreSQL distributed SQL API
APPLIES TO: Azure Cosmos DB for PostgreSQL (powered by the Citus database extension to PostgreSQL)
Azure Cosmos DB for PostgreSQL includes features beyond standard PostgreSQL. Below is a categorized reference of functions and configuration options for:
- Parallelizing query execution across shards
- Managing sharded data between multiple servers
- Compressing data with columnar storage
- Automating timeseries partitioning
SQL functions
Sharding
Name | Description |
---|---|
alter_distributed_table | Change the distribution column, shard count or colocation properties of a distributed table |
citus_copy_shard_placement | Repair an inactive shard placement using data from a healthy placement |
citus_schema_distribute | Turn a PostgreSQL schema into a distributed schema |
citus_schema_undistribute | Undo the action of citus_schema_distribute |
create_distributed_table | Turn a PostgreSQL table into a distributed (sharded) table |
create_reference_table | Maintain full copies of a table in sync across all nodes |
citus_add_local_table_to_metadata | Add a local table to metadata to enable querying it from any node |
isolate_tenant_to_new_shard | Create a new shard to hold rows with a specific single value in the distribution column |
truncate_local_data_after_distributing_table | Truncate all local rows after distributing a table |
undistribute_table | Undo the action of create_distributed_table or create_reference_table |
Shard rebalancing
Name | Description |
---|---|
citus_add_rebalance_strategy | Append a row to pg_dist_rebalance_strategy |
citus_move_shard_placement | Typically used indirectly during shard rebalancing rather than being called directly by a database administrator |
citus_set_default_rebalance_strategy | Change the strategy named by its argument to be the default chosen when rebalancing shards |
get_rebalance_progress | Monitor the moves planned and executed by rebalance_table_shards |
get_rebalance_table_shards_plan | Output the planned shard movements of rebalance_table_shards without performing them |
rebalance_table_shards | Move shards of the given table to distribute them evenly among the workers |
Colocation
Name | Description |
---|---|
create_distributed_function | Make function run on workers near colocated shards |
update_distributed_table_colocation | Update or break colocation of a distributed table |
Columnar storage
Name | Description |
---|---|
alter_columnar_table_set | Change settings on a columnar table |
alter_table_set_access_method | Convert a table between heap or columnar storage |
Timeseries partitioning
Name | Description |
---|---|
alter_old_partitions_set_access_method | Change storage method of partitions |
create_time_partitions | Create partitions of a given interval to cover a given range of time |
drop_old_time_partitions | Remove all partitions whose intervals fall before a given timestamp |
Informational
Name | Description |
---|---|
citus_get_active_worker_nodes | Get active worker host names and port numbers |
citus_relation_size | Get disk space used by all the shards of the specified distributed table |
citus_remote_connection_stats | Show the number of active connections to each remote node |
citus_stat_statements_reset | Remove all rows from citus_stat_statements |
citus_table_size | Get disk space used by all the shards of the specified distributed table, excluding indexes |
citus_total_relation_size | Get total disk space used by the all the shards of the specified distributed table, including all indexes and TOAST data |
column_to_column_name | Translate the partkey column of pg_dist_partition into a textual column name |
get_shard_id_for_distribution_column | Find the shard ID associated with a value of the distribution column |
Server parameters
Query execution
Name | Description |
---|---|
citus.all_modifications_commutative | Allow all commands to claim a shared lock |
citus.count_distinct_error_rate | Tune error rate of postgresql-hll approximate counting |
citus.enable_repartition_joins | Allow JOINs made on non-distribution columns |
citus.enable_repartitioned_insert_select | Allow repartitioning rows from the SELECT statement and transferring them between workers for insertion |
citus.limit_clause_row_fetch_count | The number of rows to fetch per task for limit clause optimization |
citus.local_table_join_policy | Where data moves when doing a join between local and distributed tables |
citus.multi_shard_commit_protocol | The commit protocol to use when performing COPY on a hash distributed table |
citus.propagate_set_commands | Which SET commands are propagated from the coordinator to workers |
citus.create_object_propagation | Behavior of CREATE statements in transactions for supported objects |
citus.use_citus_managed_tables | Allow local tables to be accessed in worker node queries |
Informational
Name | Description |
---|---|
citus.explain_all_tasks | Make EXPLAIN output show all tasks |
citus.explain_analyze_sort_method | Sort method of the tasks in the output of EXPLAIN ANALYZE |
citus.log_remote_commands | Log queries the coordinator sends to worker nodes |
citus.multi_task_query_log_level | Log-level for any query that generates more than one task |
citus.stat_statements_max | Max number of rows to store in citus_stat_statements |
citus.stat_statements_purge_interval | Frequency at which the maintenance daemon removes records from citus_stat_statements that are unmatched in pg_stat_statements |
citus.stat_statements_track | Enable/disable statement tracking |
citus.show_shards_for_app_name_prefixes | Allows shards to be displayed for selected clients that want to see them |
citus.override_table_visibility | Enable/disable shard hiding |
Inter-node connection management
Name | Description |
---|---|
citus.executor_slow_start_interval | Time to wait in milliseconds between opening connections to the same worker node |
citus.force_max_query_parallelization | Open as many connections as possible |
citus.max_adaptive_executor_pool_size | Max worker connections per session |
citus.max_cached_conns_per_worker | Number of connections kept open to speed up subsequent commands |
citus.node_connection_timeout | Max duration (in milliseconds) to wait for connection establishment |
Data transfer
Name | Description |
---|---|
citus.enable_binary_protocol | Use PostgreSQL’s binary serialization format (when applicable) to transfer data with workers |
citus.max_intermediate_result_size | Size in KB of intermediate results for CTEs and subqueries that are unable to be pushed down |
Deadlock
Name | Description |
---|---|
citus.distributed_deadlock_detection_factor | Time to wait before checking for distributed deadlocks |
citus.log_distributed_deadlock_detection | Whether to log distributed deadlock detection-related processing in the server log |
System tables
The coordinator node contains metadata tables and views to help you see data properties and query activity across the cluster.
Name | Description |
---|---|
citus_dist_stat_activity | Distributed queries that are executing on all nodes |
citus_lock_waits | Queries blocked throughout the cluster |
citus_shards | The location of each shard, the type of table it belongs to, and its size |
citus_stat_statements | Stats about how queries are being executed, and for whom |
citus_tables | A summary of all distributed and reference tables |
citus_worker_stat_activity | Queries on workers, including tasks on individual shards |
pg_dist_colocation | Which tables' shards should be placed together |
pg_dist_node | Information about worker nodes in the cluster |
pg_dist_object | Objects such as types and functions that have been created on the coordinator node and propagated to worker nodes |
pg_dist_placement | The location of shard replicas on worker nodes |
pg_dist_rebalance_strategy | Strategies that rebalance_table_shards can use to determine where to move shards |
pg_dist_shard | The table, distribution column, and value ranges for every shard |
time_partitions | Information about each partition managed by such functions as create_time_partitions and drop_old_time_partitions |
Next steps
- Learn some useful diagnostic queries
- Review the list of configuration parameters in the underlying PostgreSQL database.