Resize node pools in Azure Kubernetes Service (AKS)

Due to an increasing number of deployments or to run a larger workload, you may want to change the virtual machine scale set plan or resize AKS instances. However, as per support policies for AKS:

AKS agent nodes appear in the Azure portal as regular Azure IaaS resources. But these virtual machines are deployed into a custom Azure resource group (usually prefixed with MC_*). You cannot do any direct customizations to these nodes using the IaaS APIs or resources. Any custom changes that are not done via the AKS API will not persist through an upgrade, scale, update or reboot.

This lack of persistence also applies to the resize operation, thus, resizing AKS instances in this manner isn't supported. In this how-to guide, you'll learn the recommended method to address this scenario.

Important

This method is specific to virtual machine scale set-based AKS clusters. When using virtual machine availability sets, you are limited to only one node pool per cluster.

Example resources

Assume you want to resize an existing node pool, called nodepool1, from SKU size Standard_DS2_v2 to Standard_DS3_v2. To accomplish this task, you'll need to create a new node pool using Standard_DS3_v2, move workloads from nodepool1 to the new node pool, and remove nodepool1. In this example, we'll call this new node pool mynodepool.

Screenshot of the Azure portal page for the cluster, navigated to Settings > Node pools. One node pool, named node pool 1, is shown.

kubectl get nodes

NAME                                STATUS   ROLES   AGE   VERSION
aks-nodepool1-31721111-vmss000000   Ready    agent   10d   v1.21.9
aks-nodepool1-31721111-vmss000001   Ready    agent   10d   v1.21.9
aks-nodepool1-31721111-vmss000002   Ready    agent   10d   v1.21.9
kubectl get pods -o wide -A

NAMESPACE     NAME                                  READY   STATUS    RESTARTS   AGE     IP           NODE                                NOMINATED NODE   READINESS GATES
default       sampleapp2-74b4b974ff-676sz           1/1     Running   0          93m     10.244.1.6   aks-nodepool1-31721111-vmss000002   <none>           <none>
default       sampleapp2-76b6c4c59b-pfgbh           1/1     Running   0          94m     10.244.1.5   aks-nodepool1-31721111-vmss000002   <none>           <none>
kube-system   azure-ip-masq-agent-4n66k             1/1     Running   0          10d     10.240.0.6   aks-nodepool1-31721111-vmss000002   <none>           <none>
kube-system   azure-ip-masq-agent-9p4c8             1/1     Running   0          10d     10.240.0.4   aks-nodepool1-31721111-vmss000000   <none>           <none>
kube-system   azure-ip-masq-agent-nb7mx             1/1     Running   0          10d     10.240.0.5   aks-nodepool1-31721111-vmss000001   <none>           <none>
kube-system   coredns-845757d86-dtvvs               1/1     Running   0          10d     10.244.0.2   aks-nodepool1-31721111-vmss000000   <none>           <none>
kube-system   coredns-845757d86-x27pp               1/1     Running   0          10d     10.244.2.3   aks-nodepool1-31721111-vmss000001   <none>           <none>
kube-system   coredns-autoscaler-5f85dc856b-nfrmh   1/1     Running   0          10d     10.244.2.4   aks-nodepool1-31721111-vmss000001   <none>           <none>
kube-system   csi-azuredisk-node-9nfzt              3/3     Running   0          10d     10.240.0.4   aks-nodepool1-31721111-vmss000000   <none>           <none>
kube-system   csi-azuredisk-node-bblsb              3/3     Running   0          10d     10.240.0.5   aks-nodepool1-31721111-vmss000001   <none>           <none>
kube-system   csi-azuredisk-node-tjhj4              3/3     Running   0          10d     10.240.0.6   aks-nodepool1-31721111-vmss000002   <none>           <none>
kube-system   csi-azurefile-node-9pcr8              3/3     Running   0          3d10h   10.240.0.6   aks-nodepool1-31721111-vmss000002   <none>           <none>
kube-system   csi-azurefile-node-bh2pc              3/3     Running   0          3d10h   10.240.0.5   aks-nodepool1-31721111-vmss000001   <none>           <none>
kube-system   csi-azurefile-node-h75gq              3/3     Running   0          3d10h   10.240.0.4   aks-nodepool1-31721111-vmss000000   <none>           <none>
kube-system   konnectivity-agent-6cd55c69cf-ngdlb   1/1     Running   0          10d     10.240.0.6   aks-nodepool1-31721111-vmss000002   <none>           <none>
kube-system   konnectivity-agent-6cd55c69cf-rvvqt   1/1     Running   0          10d     10.240.0.4   aks-nodepool1-31721111-vmss000000   <none>           <none>
kube-system   kube-proxy-4wzx7                      1/1     Running   0          10d     10.240.0.4   aks-nodepool1-31721111-vmss000000   <none>           <none>
kube-system   kube-proxy-g5tvr                      1/1     Running   0          10d     10.240.0.6   aks-nodepool1-31721111-vmss000002   <none>           <none>
kube-system   kube-proxy-mrv54                      1/1     Running   0          10d     10.240.0.5   aks-nodepool1-31721111-vmss000001   <none>           <none>
kube-system   metrics-server-774f99dbf4-h52hn       1/1     Running   1          3d10h   10.244.1.3   aks-nodepool1-31721111-vmss000002   <none>           <none>

Create a new node pool with the desired SKU

Use the az aks nodepool add command to create a new node pool called mynodepool with three nodes using the Standard_DS3_v2 VM SKU:

az aks nodepool add \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name mynodepool \
    --node-count 3 \
    --node-vm-size Standard_DS3_v2 \
    --mode System \
    --no-wait

Note

Every AKS cluster must contain at least one system node pool with at least one node. In the example above, we are using a --mode of System, as the cluster is assumed to have only one node pool, necessitating a System node pool to replace it. A node pool's mode can be updated at any time.

When resizing, be sure to consider other requirements and configure your node pool accordingly. You may need to modify the above command. For a full list of the configuration options, see the az aks nodepool add reference page.

After a few minutes, the new node pool has been created:

Screenshot of the Azure portal page for the cluster, navigated to Settings > Node pools. Two node pools, named node pool 1 and my node pool are shown.

kubectl get nodes

NAME                                 STATUS   ROLES   AGE   VERSION
aks-mynodepool-20823458-vmss000000   Ready    agent   23m   v1.21.9
aks-mynodepool-20823458-vmss000001   Ready    agent   23m   v1.21.9
aks-mynodepool-20823458-vmss000002   Ready    agent   23m   v1.21.9
aks-nodepool1-31721111-vmss000000    Ready    agent   10d   v1.21.9
aks-nodepool1-31721111-vmss000001    Ready    agent   10d   v1.21.9
aks-nodepool1-31721111-vmss000002    Ready    agent   10d   v1.21.9

Cordon the existing nodes

Cordoning marks specified nodes as unschedulable and prevents any more pods from being added to the nodes.

First, obtain the names of the nodes you'd like to cordon with kubectl get nodes. Your output should look similar to the following:

NAME                                STATUS   ROLES   AGE     VERSION
aks-nodepool1-31721111-vmss000000   Ready    agent   7d21h   v1.21.9
aks-nodepool1-31721111-vmss000001   Ready    agent   7d21h   v1.21.9
aks-nodepool1-31721111-vmss000002   Ready    agent   7d21h   v1.21.9

Next, using kubectl cordon <node-names>, specify the desired nodes in a space-separated list:

kubectl cordon aks-nodepool1-31721111-vmss000000 aks-nodepool1-31721111-vmss000001 aks-nodepool1-31721111-vmss000002
node/aks-nodepool1-31721111-vmss000000 cordoned
node/aks-nodepool1-31721111-vmss000001 cordoned
node/aks-nodepool1-31721111-vmss000002 cordoned

Drain the existing nodes

Important

To successfully drain nodes and evict running pods, ensure that any PodDisruptionBudgets (PDBs) allow for at least one pod replica to be moved at a time. Otherwise, the drain/evict operation will fail. To check this, you can run kubectl get pdb -A and verify ALLOWED DISRUPTIONS is at least one or higher.

Draining nodes will cause pods running on them to be evicted and recreated on the other, schedulable nodes.

To drain nodes, use kubectl drain <node-names> --ignore-daemonsets --delete-emptydir-data, again using a space-separated list of node names:

Important

Using --delete-emptydir-data is required to evict the AKS-created coredns and metrics-server pods. If this flag isn't used, an error is expected. For more information, see the documentation on emptydir.

kubectl drain aks-nodepool1-31721111-vmss000000 aks-nodepool1-31721111-vmss000001 aks-nodepool1-31721111-vmss000002 --ignore-daemonsets --delete-emptydir-data

After the drain operation finishes, all pods other than those controlled by daemon sets are running on the new node pool:

kubectl get pods -o wide -A

NAMESPACE     NAME                                  READY   STATUS    RESTARTS   AGE     IP           NODE                                 NOMINATED NODE   READINESS GATES
default       sampleapp2-74b4b974ff-676sz           1/1     Running   0          15m     10.244.4.5   aks-mynodepool-20823458-vmss000002   <none>           <none>
default       sampleapp2-76b6c4c59b-rhmzq           1/1     Running   0          16m     10.244.4.3   aks-mynodepool-20823458-vmss000002   <none>           <none>
kube-system   azure-ip-masq-agent-4n66k             1/1     Running   0          10d     10.240.0.6   aks-nodepool1-31721111-vmss000002    <none>           <none>
kube-system   azure-ip-masq-agent-9p4c8             1/1     Running   0          10d     10.240.0.4   aks-nodepool1-31721111-vmss000000    <none>           <none>
kube-system   azure-ip-masq-agent-nb7mx             1/1     Running   0          10d     10.240.0.5   aks-nodepool1-31721111-vmss000001    <none>           <none>
kube-system   azure-ip-masq-agent-sxn96             1/1     Running   0          49m     10.240.0.9   aks-mynodepool-20823458-vmss000002   <none>           <none>
kube-system   azure-ip-masq-agent-tsq98             1/1     Running   0          49m     10.240.0.8   aks-mynodepool-20823458-vmss000001   <none>           <none>
kube-system   azure-ip-masq-agent-xzrdl             1/1     Running   0          49m     10.240.0.7   aks-mynodepool-20823458-vmss000000   <none>           <none>
kube-system   coredns-845757d86-d2pkc               1/1     Running   0          17m     10.244.3.2   aks-mynodepool-20823458-vmss000000   <none>           <none>
kube-system   coredns-845757d86-f8g9s               1/1     Running   0          17m     10.244.5.2   aks-mynodepool-20823458-vmss000001   <none>           <none>
kube-system   coredns-autoscaler-5f85dc856b-f8xh2   1/1     Running   0          17m     10.244.4.2   aks-mynodepool-20823458-vmss000002   <none>           <none>
kube-system   csi-azuredisk-node-7md2w              3/3     Running   0          49m     10.240.0.7   aks-mynodepool-20823458-vmss000000   <none>           <none>
kube-system   csi-azuredisk-node-9nfzt              3/3     Running   0          10d     10.240.0.4   aks-nodepool1-31721111-vmss000000    <none>           <none>
kube-system   csi-azuredisk-node-bblsb              3/3     Running   0          10d     10.240.0.5   aks-nodepool1-31721111-vmss000001    <none>           <none>
kube-system   csi-azuredisk-node-lcmtz              3/3     Running   0          49m     10.240.0.9   aks-mynodepool-20823458-vmss000002   <none>           <none>
kube-system   csi-azuredisk-node-mmncr              3/3     Running   0          49m     10.240.0.8   aks-mynodepool-20823458-vmss000001   <none>           <none>
kube-system   csi-azuredisk-node-tjhj4              3/3     Running   0          10d     10.240.0.6   aks-nodepool1-31721111-vmss000002    <none>           <none>
kube-system   csi-azurefile-node-29w6z              3/3     Running   0          49m     10.240.0.9   aks-mynodepool-20823458-vmss000002   <none>           <none>
kube-system   csi-azurefile-node-4nrx7              3/3     Running   0          49m     10.240.0.7   aks-mynodepool-20823458-vmss000000   <none>           <none>
kube-system   csi-azurefile-node-9pcr8              3/3     Running   0          3d11h   10.240.0.6   aks-nodepool1-31721111-vmss000002    <none>           <none>
kube-system   csi-azurefile-node-bh2pc              3/3     Running   0          3d11h   10.240.0.5   aks-nodepool1-31721111-vmss000001    <none>           <none>
kube-system   csi-azurefile-node-gqqnv              3/3     Running   0          49m     10.240.0.8   aks-mynodepool-20823458-vmss000001   <none>           <none>
kube-system   csi-azurefile-node-h75gq              3/3     Running   0          3d11h   10.240.0.4   aks-nodepool1-31721111-vmss000000    <none>           <none>
kube-system   konnectivity-agent-6cd55c69cf-2bbp5   1/1     Running   0          17m     10.240.0.7   aks-mynodepool-20823458-vmss000000   <none>           <none>
kube-system   konnectivity-agent-6cd55c69cf-7xzxj   1/1     Running   0          16m     10.240.0.8   aks-mynodepool-20823458-vmss000001   <none>           <none>
kube-system   kube-proxy-4wzx7                      1/1     Running   0          10d     10.240.0.4   aks-nodepool1-31721111-vmss000000    <none>           <none>
kube-system   kube-proxy-7h8r5                      1/1     Running   0          49m     10.240.0.7   aks-mynodepool-20823458-vmss000000   <none>           <none>
kube-system   kube-proxy-g5tvr                      1/1     Running   0          10d     10.240.0.6   aks-nodepool1-31721111-vmss000002    <none>           <none>
kube-system   kube-proxy-mrv54                      1/1     Running   0          10d     10.240.0.5   aks-nodepool1-31721111-vmss000001    <none>           <none>
kube-system   kube-proxy-nqmnj                      1/1     Running   0          49m     10.240.0.9   aks-mynodepool-20823458-vmss000002   <none>           <none>
kube-system   kube-proxy-zn77s                      1/1     Running   0          49m     10.240.0.8   aks-mynodepool-20823458-vmss000001   <none>           <none>
kube-system   metrics-server-774f99dbf4-2x6x8       1/1     Running   0          16m     10.244.4.4   aks-mynodepool-20823458-vmss000002   <none>           <none>

Troubleshooting

You may see an error like the following:

Error when evicting pods/[podname] -n [namespace] (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

By default, your cluster has AKS_managed pod disruption budgets (such as coredns-pdb or konnectivity-agent) with a MinAvailable of 1. If, for example, there are two coredns pods running, while one of them is getting recreated and is unavailable, the other is unable to be affected due to the pod disruption budget. This resolves itself after the initial coredns pod is scheduled and running, allowing the second pod to be properly evicted and recreated.

Tip

Consider draining nodes one-by-one for a smoother eviction experience and to avoid throttling. For more information, see:

Remove the existing node pool

To delete the existing node pool, use the Azure portal or the az aks nodepool delete command:

az aks nodepool delete \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name nodepool1

After completion, the final result is the AKS cluster having a single, new node pool with the new, desired SKU size and all the applications and pods properly running:

Screenshot of the Azure portal page for the cluster, navigated to Settings > Node pools. One node pool, named my node pool, is shown.

kubectl get nodes

NAME                                 STATUS   ROLES   AGE   VERSION
aks-mynodepool-20823458-vmss000000   Ready    agent   63m   v1.21.9
aks-mynodepool-20823458-vmss000001   Ready    agent   63m   v1.21.9
aks-mynodepool-20823458-vmss000002   Ready    agent   63m   v1.21.9

Next steps

After resizing a node pool by cordoning and draining, learn more about using multiple node pools.