Cluster autoscaler fails to scale with "failed to fix node group sizes" error

This article discusses how to resolve the "failed to fix node group sizes" error that appears on the cluster autoscaler logs when you're creating or managing AKS clusters.

Symptoms

Your cluster autoscaler isn't scaling up or down, and you see an error similar to the following error on the cluster autoscaler logs.

E1114 09:58:55.367731 1 static_autoscaler.go:239] Failed to fix node group sizes: failed to decrease aks-default-35246781-vmss: attempt to delete existing nodes

Cause

This error is caused by an upstream cluster autoscaler race condition. In such a case, cluster autoscaler ends with a different value than the one that's actually in the cluster.

Solution

To get out of this state, disable and re-enable the cluster autoscaler.

Contact us for help

If you have questions or need help, create a support request, or ask Azure community support. You can also submit product feedback to Azure feedback community.