Edit

Deployment planning for Azure IoT Operations

Many Azure IoT Operations settings are fixed at deployment time and can only be changed by redeploying. Before you deploy, plan your cluster topology, broker cardinality, memory profile, and the optional broker settings you need. This article summarizes the decisions you should make.

Understand the architecture

Azure IoT Operations is a set of modular, Kubernetes-native services deployed to an Azure Arc-enabled cluster. Key components include:

Component Purpose
MQTT broker High-performance MQTT 3.1.1 and 5 broker for edge messaging
Connector for OPC UA Collects data from OPC UA servers and publishes to MQTT
Data flows Routes, transforms, and pushes data to cloud endpoints
Azure Device Registry Cloud-based registry for devices, assets, and schemas
Akri services Device discovery and protocol adapters
State store Key-value persistence layer in the MQTT broker

Two terms are used throughout the documentation:

  • Deployment — The instance, Arc extensions, custom locations, and all configurable resources (assets, devices, data flows).
  • Instance — The parent resource that bundles the services.

Choose your cluster topology

Before you deploy, decide whether you need a single-node or multi-node cluster. This decision determines the hardware requirements and the broker cardinality settings.

Topology Use case Minimum hardware
Single-node Smaller deployments where high availability isn't required 4 vCPUs, 16 GB of RAM, 30 GB of storage
Multi-node (3-5 nodes) High availability and higher throughput requirements 8 vCPUs, 32 GB of RAM per node

Important

Cardinality is set at deployment time only. A new deployment is required if the cardinality settings need to be changed.

Understand broker cardinality

Cardinality is the number of frontend replicas, frontend workers, backend partitions, and backend workers in the broker deployment. Cardinality controls how the broker scales horizontally and how resilient it is to pod or node failures.

The MQTT broker has a two-tier architecture: frontend pods handle client connections and protocol processing, while backend pods handle message storage and delivery. Understanding how each tier scales is important for capacity planning.

Frontend

Frontend pods accept MQTT client connections and forward messages to the backend. Frontend pods don't store messages themselves. There are two main settings for the frontend tier:

  • Replicas: The number of frontend pods to deploy. Adding more frontend replicas increases the number of concurrent client connections the broker can handle and provides high availability if one of the frontend pods fails.
  • Workers: The number of logical workers per frontend pod. Adding more workers lets the frontend pod use more CPU cores. Each worker can consume up to one CPU core.

Backend chain

Backend pods handle message storage and delivery. There are three main settings for the backend tier:

  • Partitions: The number of partitions to deploy. Partitions are the unit of horizontal scaling for message throughput. Through a process called sharding, each partition handles a portion of the messages, sharded by topic and session. The frontend pods distribute message traffic across the partitions. Adding more partitions increases the total message throughput the broker can handle.
  • Redundancy factor: The number of backend pods to deploy per partition. Increasing the redundancy factor increases the number of data copies to provide resiliency against node failures in the cluster.
  • Workers: The number of workers per backend pod. Workers are the unit of vertical scaling within a partition — adding more workers lets the backend pod use more CPU cores on the same node. Each worker can consume up to two CPU cores, so be careful when you increase the number of workers per replica to not exceed the number of CPU cores in the cluster.

Note

The effectiveness of partition scaling depends on how evenly the topic space is spread across partitions. A highly skewed distribution can create hotspots on a single partition.

Important

The backend redundancy factor must be 2 or greater. The broker requires at least two backend replicas per partition for high availability and rolling update support. Setting the redundancy factor to 1 results in a deployment validation error.

Throughput estimate

The performance of an individual partition depends heavily on the CPU characteristics of the node it's running on. As a rule of thumb, expect roughly 5,000 to 6,000 QoS 1 messages per second per partition with 8-KB payloads on a 2-GHz CPU (~4-GHz turbo). Real-world performance depends on many factors, so use this number only as a starting point for capacity planning.

For detailed benchmark data, see MQTT Broker performance benchmarking.

Single-node recommendations

  • Frontend replicas: Set to 1.
  • Frontend workers: Set to half the number of CPU cores per node.
  • Backend replicas (redundancy factor): Set to at least 2 so the broker can perform rolling updates.

Example: single node, 4 CPU cores

Frontend setting Value Backend setting Value
Replicas 1 Redundancy factor 2
Workers 2 Workers 1
Partitions 1

Multi-node recommendations

The following values are recommended for optimal performance. For large clusters with low traffic, these values can be set lower than the recommendations without causing issues. More considerations such as memory (RAM) and performance characteristics are discussed in the following sections. Always test your configuration with the expected workload to confirm performance.

  • Frontend replicas: Set equal to the number of nodes in the cluster.
  • Frontend workers: Set to half the number of CPU cores per node.
  • Backend replicas (redundancy factor): Set to 2 for redundancy and rolling update support.
  • Backend partitions: Set equal to the number of nodes in the cluster.
  • Backend workers: Set to half the number of CPU cores per node.

Example: 3-node cluster, 8 CPU cores per node

Frontend setting Value Backend setting Value
Replicas 3 Redundancy factor 2
Workers 4 Workers 4
Partitions 3

Example: 5-node cluster, 16 CPU cores per node

Frontend setting Value Backend setting Value
Replicas 5 Redundancy factor 2
Workers 8 Workers 8
Partitions 5

Important

The total number of frontend and backend workers per node should not exceed the number of CPU cores available on that node. Over-provisioning workers beyond available cores can cause CPU contention and degrade performance.

CPU resource limits

To prevent resource starvation in the cluster, the broker can be configured to request Kubernetes CPU resource limits based on the cardinality settings. When enabled, scaling the number of replicas or workers proportionally increases the CPU resources required.

Important

The default value for generateResourceLimits.cpu depends on the deployment method:

  • Azure CLI (az iot ops create): Disabled by default, to avoid deployment failures on resource-constrained clusters such as single-node clusters where CPU requests can exceed available resources.
  • REST API, Bicep, and ARM templates: Enabled by default. If you deploy with these methods without explicitly setting generateResourceLimits.cpu, CPU resource limits are applied automatically.

If you enable CPU resource limits, make sure your cluster has enough CPU resources to satisfy the broker's requests based on your cardinality configuration.

The default for REST API, Bicep, and ARM templates is defined in the Broker API specification.

The MQTT broker requests CPU resources per pod based on the number of workers configured:

  • Frontend pods: 1.0 CPU per worker
  • Backend pods: 2.0 CPU per worker

Use the following formulas to calculate total CPU requirements:

Component Formula
Frontend CPU replicas × frontend.workers × 1.0 CPU
Backend CPU partitions × redundancyFactor × backend.workers × 2.0 CPU
Total broker CPU Frontend CPU + Backend CPU

Caution

The broker isn't the only component that consumes CPU on the cluster. Other Azure IoT Operations components (such as the dataflow engine, OPC UA connector, and system pods) also reserve CPU resources, typically 200-300m in aggregate. When planning cluster capacity, make sure to account for this overhead on top of the broker's CPU requirements. If the total CPU requested by all pods exceeds the available CPU on your cluster, broker pods get stuck in a Pending state.

Example: small cluster

Consider a 2-node cluster with 4 CPU cores per node (8 cores total) with the following cardinality:

{
  "cardinality": {
    "frontend": {
      "replicas": 2,
      "workers": 2
    },
    "backendChain": {
      "partitions": 1,
      "redundancyFactor": 2,
      "workers": 1
    }
  }
}

The broker requests:

  • Frontend CPU: 2 replicas × 2 workers × 1.0 = 4.0 CPU
  • Backend CPU: 1 partition × 2 RF × 1 worker × 2.0 = 4.0 CPU
  • Total broker CPU: 8.0 CPU

This configuration requests 8.0 CPU on a cluster with only 8 cores, leaving nothing for other Azure IoT Operations components (200-300m) or for Kubernetes system pods. The broker pods stay in Pending state with Insufficient cpu errors.

To resolve this, either add more nodes, increase cores per node, or reduce the broker cardinality.

Example: larger deployment

The following cardinality requests significantly more CPU resources:

{
  "cardinality": {
    "frontend": {
      "replicas": 3,
      "workers": 2
    },
    "backendChain": {
      "partitions": 3,
      "redundancyFactor": 2,
      "workers": 2
    }
  }
}
  • Frontend CPU: 3 replicas × 2 workers × 1.0 = 6.0 CPU
  • Backend CPU: 3 partitions × 2 RF × 2 workers × 2.0 = 24.0 CPU
  • Total broker CPU: 30.0 CPU

A cluster needs at least 30 CPU cores available for broker pods alone, plus headroom for other Azure IoT Operations components and Kubernetes system pods.

CPU resource limit configuration

CPU resource limits are controlled by the generateResourceLimits.cpu field in the Broker resource. This configuration is supported only by using the --broker-config-file flag when you deploy Azure IoT Operations by using the az iot ops create command. For more information, see Azure CLI support for advanced MQTT broker configuration.

Prepare a Broker configuration file by following the GenerateResourceLimits API reference. The following examples show the two possible values:

{
  "generateResourceLimits": {
    "cpu": "Enabled"
  }
}

Or

{
  "generateResourceLimits": {
    "cpu": "Disabled"
  }
}

Choose your memory profile

The memory profile controls the maximum MQTT message size the broker accepts, idle memory usage, and maximum memory usage of each pod. Decide on the right memory profile before deployment based on your expected message sizes and throughput.

Memory profile Maximum message size Idle frontend memory (per pod) Maximum frontend memory (per pod) Idle backend memory (per pod) Maximum backend memory (per pod) Use case
Tiny 4 MB ~29 MiB ~99 MiB ~41 MiB ~102 MiB Low traffic, small packets only
Low 16 MB ~33 MiB ~387 MiB ~66 MiB ~390 MiB Limited memory, small packets
Medium (default) 64 MB ~169 MiB ~1.9 GiB ~211 MiB ~1.5 GiB Moderate traffic and message sizes
High 256 MB ~4.9 GiB ~4.9 GiB ~5.8 GiB ~5.8 GiB High throughput, large messages

Note

The memory values in the table are per pod. All workers within a pod share the same memory allocation — adding more workers doesn't increase the pod's memory limit.

Warning

The broker rejects messages when memory usage reaches 75% capacity. Choose a profile with sufficient headroom for your expected message sizes and throughput.

Incoming buffer and backpressure

Each memory profile defines a maximum incoming buffer size for PUBLISH data per backend worker. When the buffer reaches 75% capacity, the broker activates backpressure mechanisms and begins rejecting incoming messages. Rejected packets receive a PUBACK response with a Quota exceeded error code.

The following table shows the per-worker incoming buffer sizes for each profile:

Memory profile Max incoming buffer (per worker) Effective buffer (at 75% backpressure)
Tiny ~16 MiB ~12 MiB
Low ~64 MiB ~48 MiB
Medium ~576 MiB ~432 MiB
High ~2 GiB ~1.5 GiB

When choosing a memory profile, consider:

  • Tiny: Only one frontend should be used. Send only packets smaller than 4 MiB.
  • Low: Only one or two frontends should be used. Send only packets smaller than 16 MiB.
  • Medium: Suitable for most production workloads with moderate message sizes.
  • High: Use when you need to handle large messages or high throughput with large buffers.

Total broker memory depends on both the memory profile and the cardinality (number of frontend replicas, backend partitions, and redundancy factor). More pods mean more total memory. For measured baseline resource consumption across different configurations, see Baseline resource profiles.

Calculate total memory usage

You can calculate total memory usage with this formula:

M_total = (R_fe × M_fe) + (P_be × RF_be × M_be × W_be)

Where:

Variable Description
M_total Total memory usage
R_fe The number of frontend replicas
M_fe The memory usage of each frontend replica
P_be The number of backend partitions
RF_be Backend redundancy factor
M_be The memory usage of each backend replica
W_be The number of workers per backend replica

For example, if you choose the Medium memory profile, the profile has a frontend memory usage of 1.9 GiB and a backend memory usage of 1.5 GiB. Assume that the broker configuration is 2 frontend replicas, 2 backend partitions, and a backend redundancy factor of 2. The total memory usage is:

M_total = (2 × 1.9 GiB) + (2 × 2 × 1.5 GiB × 2)
        = 15.8 GiB

In comparison, the Tiny memory profile has a frontend memory usage of 99 MiB and a backend memory usage of 102 MiB. With the same broker configuration, the total memory usage is:

M_total = (2 × 99 MiB) + (2 × 2 × 102 MiB × 2)
        = 198 MiB + 816 MiB
        = 1014 MiB (≈ 1.0 GiB)

Memory profile configuration

When you deploy IoT Operations by using the az iot ops create command, the --broker-mem-profile parameter specifies the memory profile settings.

For example, the following command sets the memory profile to Tiny (other parameters are omitted for brevity):

az iot ops create ... --broker-mem-profile Tiny

To learn more, see az iot ops create optional parameters.

Optional broker settings

The following broker settings are also configured at deployment time and can't be changed afterward. Review these if they apply to your scenario:

  • Disk-backed message buffer — Buffer messages to disk when subscriber queues exceed available memory. Useful for persistent sessions and connectivity challenges.
  • Persistence — Write critical broker data to disk to preserve it across restarts.
  • Diagnostics — Configure metrics, logs, and self-check probes for the MQTT broker.
  • Advanced MQTT options — Customize session expiry, message expiry, subscriber queue limits, and keep-alive settings.
  • Internal traffic encryption — Configure encryption of internal traffic between broker frontend and backend pods (on by default).

Next steps