Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
Azure Cosmos DB for PostgreSQL is no longer supported for new projects. Don't use this service for new projects. Instead, use one of these two services:
Use Azure Cosmos DB for NoSQL for a distributed database solution designed for high-scale scenarios with a 99.999% availability service level agreement (SLA), instant autoscale, and automatic failover across multiple regions.
Use the Elastic Clusters feature of Azure Database For PostgreSQL for sharded PostgreSQL using the open-source Citus extension.
Azure Cosmos DB for PostgreSQL includes features beyond standard PostgreSQL. Below is a categorized reference of functions and configuration options for:
- Parallelizing query execution across shards
- Managing sharded data between multiple servers
- Compressing data with columnar storage
- Automating timeseries partitioning
SQL functions
Sharding
| Name | Description |
|---|---|
| alter_distributed_table | Change the distribution column, shard count or colocation properties of a distributed table |
| citus_copy_shard_placement | Repair an inactive shard placement using data from a healthy placement |
| citus_schema_distribute | Turn a PostgreSQL schema into a distributed schema |
| citus_schema_undistribute | Undo the action of citus_schema_distribute |
| create_distributed_table | Turn a PostgreSQL table into a distributed (sharded) table |
| create_reference_table | Maintain full copies of a table in sync across all nodes |
| citus_add_local_table_to_metadata | Add a local table to metadata to enable querying it from any node |
| isolate_tenant_to_new_shard | Create a new shard to hold rows with a specific single value in the distribution column |
| truncate_local_data_after_distributing_table | Truncate all local rows after distributing a table |
| undistribute_table | Undo the action of create_distributed_table or create_reference_table |
Shard rebalancing
| Name | Description |
|---|---|
| citus_add_rebalance_strategy | Append a row to pg_dist_rebalance_strategy |
| citus_move_shard_placement | Typically used indirectly during shard rebalancing rather than being called directly by a database administrator |
| citus_set_default_rebalance_strategy | Change the strategy named by its argument to be the default chosen when rebalancing shards |
| get_rebalance_progress | Monitor the moves planned and executed by rebalance_table_shards |
| get_rebalance_table_shards_plan | Output the planned shard movements of rebalance_table_shards without performing them |
| rebalance_table_shards | Move shards of the given table to distribute them evenly among the workers |
Colocation
| Name | Description |
|---|---|
| create_distributed_function | Make function run on workers near colocated shards |
| update_distributed_table_colocation | Update or break colocation of a distributed table |
Columnar storage
| Name | Description |
|---|---|
| alter_columnar_table_set | Change settings on a columnar table |
| alter_table_set_access_method | Convert a table between heap or columnar storage |
Timeseries partitioning
| Name | Description |
|---|---|
| alter_old_partitions_set_access_method | Change storage method of partitions |
| create_time_partitions | Create partitions of a given interval to cover a given range of time |
| drop_old_time_partitions | Remove all partitions whose intervals fall before a given timestamp |
Informational
| Name | Description |
|---|---|
| citus_get_active_worker_nodes | Get active worker host names and port numbers |
| citus_relation_size | Get disk space used by all the shards of the specified distributed table |
| citus_remote_connection_stats | Show the number of active connections to each remote node |
| citus_stat_statements_reset | Remove all rows from citus_stat_statements |
| citus_table_size | Get disk space used by all the shards of the specified distributed table, excluding indexes |
| citus_total_relation_size | Get total disk space used by the all the shards of the specified distributed table, including all indexes and TOAST data |
| column_to_column_name | Translate the partkey column of pg_dist_partition into a textual column name |
| get_shard_id_for_distribution_column | Find the shard ID associated with a value of the distribution column |
Server parameters
Query execution
| Name | Description |
|---|---|
| citus.all_modifications_commutative | Allow all commands to claim a shared lock |
| citus.count_distinct_error_rate | Tune error rate of postgresql-hll approximate counting |
| citus.enable_repartition_joins | Allow JOINs made on non-distribution columns |
| citus.enable_repartitioned_insert_select | Allow repartitioning rows from the SELECT statement and transferring them between workers for insertion |
| citus.limit_clause_row_fetch_count | The number of rows to fetch per task for limit clause optimization |
| citus.local_table_join_policy | Where data moves when doing a join between local and distributed tables |
| citus.multi_shard_commit_protocol | The commit protocol to use when performing COPY on a hash distributed table |
| citus.propagate_set_commands | Which SET commands are propagated from the coordinator to workers |
| citus.create_object_propagation | Behavior of CREATE statements in transactions for supported objects |
| citus.use_citus_managed_tables | Allow local tables to be accessed in worker node queries |
Informational
| Name | Description |
|---|---|
| citus.explain_all_tasks | Make EXPLAIN output show all tasks |
| citus.explain_analyze_sort_method | Sort method of the tasks in the output of EXPLAIN ANALYZE |
| citus.log_remote_commands | Log queries the coordinator sends to worker nodes |
| citus.multi_task_query_log_level | Log-level for any query that generates more than one task |
| citus.stat_statements_max | Max number of rows to store in citus_stat_statements |
| citus.stat_statements_purge_interval | Frequency at which the maintenance daemon removes records from citus_stat_statements that are unmatched in pg_stat_statements |
| citus.stat_statements_track | Enable/disable statement tracking |
| citus.show_shards_for_app_name_prefixes | Allows shards to be displayed for selected clients that want to see them |
| citus.override_table_visibility | Enable/disable shard hiding |
Inter-node connection management
| Name | Description |
|---|---|
| citus.executor_slow_start_interval | Time to wait in milliseconds between opening connections to the same worker node |
| citus.force_max_query_parallelization | Open as many connections as possible |
| citus.max_adaptive_executor_pool_size | Max worker connections per session |
| citus.max_cached_conns_per_worker | Number of connections kept open to speed up subsequent commands |
| citus.node_connection_timeout | Max duration (in milliseconds) to wait for connection establishment |
Data transfer
| Name | Description |
|---|---|
| citus.enable_binary_protocol | Use PostgreSQL’s binary serialization format (when applicable) to transfer data with workers |
| citus.max_intermediate_result_size | Size in KB of intermediate results for CTEs and subqueries that are unable to be pushed down |
Deadlock
| Name | Description |
|---|---|
| citus.distributed_deadlock_detection_factor | Time to wait before checking for distributed deadlocks |
| citus.log_distributed_deadlock_detection | Whether to log distributed deadlock detection-related processing in the server log |
System tables
The coordinator node contains metadata tables and views to help you see data properties and query activity across the cluster.
| Name | Description |
|---|---|
| citus_dist_stat_activity | Distributed queries that are executing on all nodes |
| citus_lock_waits | Queries blocked throughout the cluster |
| citus_shards | The location of each shard, the type of table it belongs to, and its size |
| citus_stat_statements | Stats about how queries are being executed, and for whom |
| citus_tables | A summary of all distributed and reference tables |
| citus_worker_stat_activity | Queries on workers, including tasks on individual shards |
| pg_dist_colocation | Which tables' shards should be placed together |
| pg_dist_node | Information about worker nodes in the cluster |
| pg_dist_object | Objects such as types and functions that have been created on the coordinator node and propagated to worker nodes |
| pg_dist_placement | The location of shard replicas on worker nodes |
| pg_dist_rebalance_strategy | Strategies that rebalance_table_shards can use to determine where to move shards |
| pg_dist_shard | The table, distribution column, and value ranges for every shard |
| time_partitions | Information about each partition managed by such functions as create_time_partitions and drop_old_time_partitions |
Next steps
- Learn some useful diagnostic queries
- Review the list of configuration parameters in the underlying PostgreSQL database.