Cluster autoscaler
To respond to changing pod demand, the Kubernetes cluster autoscaler adjusts the number of nodes in an autoscale-enabled node pool. The cluster autoscaler doesn't scale because of CPU or memory pressure on running nodes. Instead, it periodically reevaluates the cluster and looks for pods that can't be scheduled because their resource requests don't fit on the available nodes. By default, the scan interval is 10 seconds.
If the cluster autoscaler determines that a change is required, the number of nodes in the affected node pool is increased or decreased within the minimum and maximum node counts that you configure. The maximum count limits scale-out, and the minimum count prevents scale-in below the capacity you require. Don't manually enable or change Virtual Machine Scale Set autoscaling for AKS nodes. Let the Kubernetes cluster autoscaler manage the node pool scale settings.
For more information about how the cluster autoscaler works in AKS and how to enable it by using Azure CLI parameters such as --enable-cluster-autoscaler, --min-count, and --max-count, see Cluster autoscaling in AKS overview and Use the cluster autoscaler in AKS.
The cluster autoscaler is typically used alongside the horizontal pod autoscaler. When combined, the horizontal pod autoscaler increases or decreases the number of pods based on application demand, and the cluster autoscaler adjusts the number of nodes to run schedulable pods.
Scale out events
If a node doesn't have sufficient compute resources to run a requested pod, that pod can't progress through the scheduling process. The pod can't start unless more compute resources are available within the node pool.
When the cluster autoscaler notices pods in a Pending state because of node pool resource constraints, the number of nodes within the appropriate node pool is increased to provide extra compute resources. A pending pod doesn't always trigger scale-out. The cluster autoscaler simulates whether a new node could schedule the pod, so scale-out might not be triggered if the pod is blocked by constraints such as taints and tolerations, node affinity, restrictive topology spread rules, PersistentVolume node affinity or volume topology conflicts, pod PriorityClass values below -10, or the node pool maximum. Even when scale-out is triggered, core vCPU quota exhaustion, subnet IP exhaustion, or request/API rate limits can cause node provisioning to fail or back off. When the new nodes are successfully deployed and marked Ready, the Kubernetes scheduler can place the pending pods on them.
If your application needs to scale rapidly, some pods might remain pending while the cluster autoscaler provisions more VM-backed nodes. For applications that have high burst demands and compatible workload requirements, you can scale with virtual nodes and Azure Container Instances.
Scale in events
The cluster autoscaler also evaluates whether nodes are underutilized and whether their pods can safely move to other nodes. This scenario indicates the node pool has more compute resources than required, and the number of nodes can be decreased. By default, nodes that are no longer needed for 10 minutes are eligible for deletion. When this situation occurs, the cluster autoscaler drains the selected nodes, the pods are rescheduled on other available nodes, and the node count decreases.
Your applications might experience some disruption as pods are scheduled on different nodes when the cluster autoscaler decreases the number of nodes. To minimize disruption, avoid relying on a single pod instance, configure appropriate replica counts, and use pod disruption budgets that allow safe movement while preserving availability. Overly restrictive disruption budgets, pods that aren't managed by a controller, scheduling constraints that can't be satisfied elsewhere, pods with local storage when the autoscaler profile skips them, pods marked with cluster-autoscaler.kubernetes.io/safe-to-evict: "false", or non-DaemonSet, non-mirror pods in the kube-system namespace when the profile skips system pods can prevent scale-in.