Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The MQTT broker emits observability metrics that you can use to monitor the health of your solution. This article lists the available metrics.
To configure options for these metrics, see Diagnostics.
Many metrics are tagged with a category dimension, which clients set by including a metriccategory user property in the MQTT CONNECT packet. Sessions without a metriccategory are tagged as category=uncategorized.
Messaging metrics
| Metric | Type | Description | Dimensions |
|---|---|---|---|
| aio_broker_publishes_received | Counter | Number of incoming PUBLISH packets received from clients. | namespace, hostname, category |
| aio_broker_publishes_sent | Counter | Number of outgoing PUBLISH packets sent to clients. Counts each delivery separately even if multiple clients receive the same payload. Doesn't count acknowledgment (PUBACK) packets. |
namespace, hostname, category |
| aio_broker_payload_bytes_received | Counter | Number of payload bytes for all PUBLISH packets received. Doesn't include MQTT packet overhead or properties. | namespace, hostname, category |
| aio_broker_payload_bytes_sent | Counter | Number of payload bytes for all PUBLISH packets sent. Doesn't include MQTT packet overhead or properties. | namespace, hostname, category |
| aio_broker_authentication_successes | Counter | Number of successful client authentications. | namespace, hostname, category |
| aio_broker_authentication_failures | Counter | Number of failed client authentications. A failure is an error that prevents the authentication check. | namespace, hostname, category |
| aio_broker_authentication_deny | Counter | Number of denied client authentications. | namespace, hostname, category |
| aio_broker_authorization_allow | Counter | Number of successful client authorizations. | namespace, hostname, category |
| aio_broker_authorization_deny | Counter | Number of denied client authorizations. | namespace, hostname, category |
| aio_broker_authorization_failures | Counter | Number of failed client authorizations. A failure is an error that prevents the authorization check. | namespace, hostname, category |
| aio_broker_qos0_messages_dropped | Counter | Number of QoS 0 messages dropped because of high volume or memory limits. | namespace, hostname, category, direction |
| aio_broker_store_retained_messages | Gauge | Retained messages currently stored. | namespace, backend_chain |
| aio_broker_store_retained_bytes | Gauge | Bytes used by retained message payloads. Doesn't include metadata overhead. | namespace, backend_chain |
| aio_broker_store_will_messages | Gauge | Will messages currently stored. | namespace, backend_chain |
| aio_broker_store_will_bytes | Gauge | Bytes used by will message payloads. Doesn't include metadata overhead. | namespace, backend_chain |
| aio_broker_store_expired_messages | Counter | Messages that expired before delivery. | namespace, backend_chain |
Memory and backpressure metrics
| Metric | Type | Description | Dimensions |
|---|---|---|---|
| aio_broker_buffer_pool_used_percent | Gauge | Buffer pool utilization percentage. When a pool reaches about 75% usage, the broker triggers backpressure. | namespace, hostname, name |
| aio_broker_backpressure_packets_rejected_memory | Counter | Packets rejected because of memory backpressure. | namespace, hostname |
| aio_broker_backpressure_packets_rejected_disk | Counter | Packets rejected because of disk backpressure. | namespace, backend_chain |
Broker operator health metrics
Metrics for cluster health and replica management.
| Metric | Type | Description | Dimensions |
|---|---|---|---|
| aio_broker_backend_replicas | Gauge | Backend replica count. Must match the value specified in the Broker custom resource. |
namespace, hostname |
| aio_broker_backend_replicas_restarts | Counter | Backend replica restarts. Incremented each time a backend replica restarts. The hostname dimension indicates which replica restarted. |
namespace, hostname |
| aio_broker_backend_partitions | Gauge | Backend partition count. Must match the value specified in the Broker custom resource. |
namespace, hostname |
| aio_broker_backend_workers | Gauge | Backend worker count. Must match the value specified in the Broker custom resource. |
namespace, hostname |
| aio_broker_frontend_replicas | Gauge | Frontend replica count. Must match the value specified in the Broker custom resource. |
namespace, hostname |
| aio_broker_frontend_replicas_restarts | Counter | Frontend replica restarts. Incremented each time a frontend replica restarts. The hostname dimension indicates which frontend restarted. |
namespace, hostname |
| aio_broker_frontend_workers | Gauge | Frontend worker count. Must match the value specified in the Broker custom resource. |
namespace, hostname |
| aio_broker_version | Counter | Broker version information. | version |
Connection and subscription metrics
These metrics provide observability for the connections and subscriptions on the broker.
| Metric | Type | Description | Dimensions |
|---|---|---|---|
| aio_broker_store_total_sessions | Gauge | Total sessions managed by the backend store, including connected and offline sessions. | namespace, backend_chain, is_persistent |
| aio_broker_store_connected_sessions | Gauge | Currently connected sessions in the store. | namespace, backend_chain, is_persistent |
| aio_broker_store_subscriptions | Gauge | Total subscriptions in the store. Includes subscriptions from offline sessions. | namespace, backend_chain |
| aio_broker_store_shared_subscriptions | Gauge | Shared subscriptions in the store. Includes subscriptions from offline sessions. | namespace, backend_chain |
State store metrics
This set of metrics tracks the overall state of the state store.
| Metric | Type | Description | Dimensions |
|---|---|---|---|
| aio_broker_state_store_deletions | Counter | Key deletion requests received (counts both successes and errors). | namespace, backend_chain |
| aio_broker_state_store_insertions | Counter | Key insertion requests received (counts both successes and errors). | namespace, backend_chain |
| aio_broker_state_store_keynotify_requests | Counter | KEYNOTIFY requests received (allows clients to monitor key changes). | namespace, backend_chain |
| aio_broker_state_store_modifications | Counter | Key modification requests received (counts both successes and errors). | namespace, backend_chain |
| aio_broker_state_store_notifications_sent | Counter | Key change notifications sent to clients registered via KEYNOTIFY. | namespace, backend_chain |
| aio_broker_state_store_retrievals | Counter | Key retrieval requests received (counts both successes and errors). | namespace, backend_chain |
Disk-backed message buffer metrics
These metrics provide observability for the disk-backed message buffer.
| Metric | Type | Description | Dimensions |
|---|---|---|---|
| aio_broker_disk_transfers_completed | Counter | Completed disk transfers (publishes transferred from buffer pool to disk). | namespace, hostname |
| aio_broker_disk_transfers_failed | Counter | Failed disk transfers. | namespace, hostname |
| aio_broker_total_disk_backed_message_buffer_usage | Gauge | Total disk-backed message buffer usage. | namespace, hostname |
Note
Only certain backend buffer pools, specifically the dynamic ones named "reader", use the disk-backed message buffer feature. These pools store subscriber message queues and transfer elements to disk when usage exceeds 75%.
Persistence metrics
These metrics provide observability for persistence and recovery operations. They're reported only when the broker persistence feature is enabled.
| Metric | Type | Description | Dimensions |
|---|---|---|---|
| aio_broker_persistence_disk_usage | Gauge | Current disk usage in bytes. | namespace, backend_chain |
| aio_broker_persistence_disk_percent_available | Gauge | Disk space available, calculated from current usage and total disk size. | namespace, backend_chain |
| aio_broker_persistence_last_recovery_time | Gauge | Time (seconds) for most recent recovery from disk after a crash. Reports 0 if no recovery occurred. |
namespace, backend_chain |
| aio_broker_persistence_dynamic_requests | Counter | Dynamic persistence requests. | namespace, hostname, allowed |
Failure recovery metrics
Metrics for chain replication and failure recovery.
| Metric | Type | Description | Dimensions |
|---|---|---|---|
| aio_broker_store_transfer_batch_sender_message_count | Counter | Messages sent by store transfer sender during recovery. | namespace, hostname, worker_id |
| aio_broker_store_transfer_batch_receiver_message_count | Counter | Messages received by store transfer receiver (should equal sender count). | namespace, hostname, worker_id |
| aio_broker_store_transfer_batch_sender_transfer_bytes | Counter | Bytes sent by store transfer sender. | namespace, hostname, worker_id |
| aio_broker_store_transfer_patch_tracker_sender_message_count | Counter | Patch tracker messages sent. | namespace, hostname, worker_id |
| aio_broker_store_transfer_patch_tracker_receiver_message_count | Counter | Patch tracker messages received. | namespace, hostname, worker_id |
| aio_broker_store_transfer_ack_event_sender_message_count | Counter | Ack event messages sent. | namespace, hostname, worker_id |
| aio_broker_store_transfer_ack_event_receiver_message_count | Counter | Ack event messages received. | namespace, hostname, worker_id |
| aio_broker_store_transfer_ack_event_sender_transfer_bytes | Counter | Bytes sent for ack events. | namespace, hostname, worker_id |
Self-test and diagnostics metrics
Metrics from the diagnostics service for monitoring broker SLO compliance.
For each operation in the following table, the broker also emits three aggregate latency metrics with no dimensions: *_latency_last_value_ms (most recent value), *_latency_mu_ms (mean), and *_latency_sigma_ms (standard deviation). Replace * with aio_broker_connect, aio_broker_publish, aio_broker_subscribe, aio_broker_unsubscribe, aio_broker_ping, or aio_broker_message_delivery_check.
| Metric | Type | Description | Dimensions |
|---|---|---|---|
| aio_broker_connect_route_replication_correctness | Gauge | Connect route replication correctness. 1 indicates success; 0 indicates failure. Failure means that the probe didn't receive the response in time. |
frontend, backend_chain |
| aio_broker_connect_latency_route_ms | Gauge | Connect latency value for a specific route (ms). | frontend, backend_chain |
| aio_broker_publish_route_replication_correctness | Gauge | Publish route replication correctness. 1 indicates success; 0 indicates failure. Failure means that the probe didn't receive the response in time. |
frontend, backend_chain |
| aio_broker_publish_latency_route_ms | Gauge | Publish latency value for a specific route (ms). | frontend, backend_chain |
| aio_broker_subscribe_route_replication_correctness | Gauge | Subscribe route replication correctness. 1 indicates success; 0 indicates failure. Failure means that the probe didn't receive the response in time. |
frontend, backend_chain, is_wildcard |
| aio_broker_subscribe_latency_route_ms | Gauge | Subscribe latency value for a specific route (ms). | frontend, backend_chain, is_wildcard |
| aio_broker_unsubscribe_route_replication_correctness | Gauge | Unsubscribe route replication correctness. 1 indicates success; 0 indicates failure. Failure means that the probe didn't receive the response in time. |
frontend, backend_chain, is_wildcard |
| aio_broker_unsubscribe_latency_route_ms | Gauge | Unsubscribe latency value for a specific route (ms). | frontend, backend_chain, is_wildcard |
| aio_broker_ping_correctness | Gauge | Ping correctness. 1 indicates success; 0 indicates failure. Failure means that the probe didn't receive the response in time. |
frontend |
| aio_broker_ping_latency_route_ms | Gauge | Ping latency value for a specific route (ms). | frontend |
| aio_broker_message_delivery_check_total_timeouts | Gauge | Message delivery check correctness. Message delivery check validates the end-to-end delivery of a message from a publisher to the subscriber. 0 indicates success; a value greater than 0 indicates failure. Failure means that the subscriber probe didn't receive the response in time. |
|
| aio_broker_message_delivery_check_latency_route_ms | Gauge | Message delivery check latency value for a specific route (ms). | |
| aio_broker_message_delivery_check_total_messages_sent | Counter | Total messages sent for delivery check. | |
| aio_broker_message_delivery_check_total_messages_received | Counter | Total messages received for delivery check. |
Developer metrics
Metrics for debugging and diagnostics of internal traffic flow.
| Metric | Type | Description | Dimensions |
|---|---|---|---|
| aio_broker_patch_tracker_held_patches | Gauge | Pending message IDs (of any type, including internal) currently held in the message ID tracker. The message tracker is used to guarantee internal message delivery and ordering. IDs in the message tracker are cleaned up periodically. In a stable state, the chart should look like a sawtooth pattern. | namespace, hostname, worker_id |
| aio_broker_ack_handler_pending_transactions | Gauge | Pending messages in the ack handler. The ack handler tracks the acknowledgment of messages (of any type, including internal). In a stable state, the chart should look like a flat line close to zero. Spikes or high values might indicate issues with message processing or queue buildup. | namespace, hostname, worker_id |
| aio_broker_internal_client_connected | Counter | Internal client connections. | namespace, hostname, worker_id, endpoint |
| aio_broker_internal_client_disconnected | Counter | Internal client disconnections. | namespace, hostname, worker_id, endpoint |
| aio_broker_internal_client_removed | Counter | Internal clients removed. | namespace, hostname, worker_id, endpoint |
Dimension reference
namespace
Kubernetes namespace where the broker is deployed.
hostname
Pod hostname that emitted the metric.
category
Comes from the MQTT CONNECT packet's metriccategory user property. When a client connects to the broker, it can include this user property to categorize its traffic. Dashboards can use this dimension to differentiate traffic sources without the high cardinality issues of tagging metrics with topics.
Sessions without a metriccategory receive category=uncategorized.
Important
The number of unique categories is limited to 1,000. Avoid using high-cardinality values for metriccategory to prevent metric data loss.
backend_chain
Identifies which partition or chain the metric belongs to. Backend nodes in the same chain report the same values for store-level metrics. The value is zero-indexed (for example, backend_chain=0 for the first chain, backend_chain=1 for the second). The total number of chains equals backend partitions × backend workers per partition.
direction
Direction of message flow. Values are incoming (on receipt from a client) or outgoing (before delivery to a client).
worker_id
Identifies which worker within a frontend or backend partition emitted the metric. Worker identifiers are zero-indexed.
frontend
Identifies which frontend pod handled the probe operation. The value is the pod index extracted from the frontend pod name (for example, 0 for aio-broker-frontend-0).
is_wildcard
Whether the subscription topic contains a wildcard pattern. Values are true (wildcard topic like foo/#) or false (exact topic match).
is_persistent
Whether a session is persistent (true) or transient (false).
name
Identifies a specific buffer pool. Used with memory and backpressure metrics.
endpoint
Type of internal connection. Values are hm (health manager), head (chain head), tail (chain tail), or successor (successor node).
allowed
Whether the dynamic persistence request was allowed (true) or denied (false).
version
Broker version string.