The error message you're encountering, "Failed to scale up: Could not compute total resources: No node info for: agentpool," indicates that the Cluster Autoscaler is unable to retrieve information about the nodes in your AKS cluster. This can happen if there is an issue with the configuration or if the Cluster Autoscaler is not properly deployed.
To troubleshoot this issue, you can follow these steps:
Verify that the Cluster Autoscaler deployment is correctly configured with the appropriate parameters. Ensure that you have set the correct values for the minimum and maximum number of nodes, as well as any other required configuration options.
Check if the Cluster Autoscaler is running as a pod in your cluster. You can use the following command to list all the pods in the cluster:
sqlCopy code
kubectl get pods --all-namespaces
Look for a pod with a name like "cluster-autoscaler-xxxxx" and check its status and logs for any error messages:
phpCopy code
kubectl logs <pod-name> -n <namespace>
Replace <pod-name>
with the name of the Cluster Autoscaler pod and <namespace>
with the namespace where it is deployed.
Ensure that the Cluster Autoscaler has the necessary permissions to interact with the AKS cluster and scale nodes. The service principal used by the Cluster Autoscaler should have the appropriate RBAC roles assigned. Specifically, it should have the Reader
role for the AKS cluster and the Virtual Machine Contributor
role for the resource group containing the cluster.
You can verify the roles assigned to the service principal using the Azure CLI:
cssCopy code
az role assignment list --assignee <service-principal-id>
Replace <service-principal-id>
with the ID of the service principal used by the Cluster Autoscaler.
If you're using a virtual machine scale set (VMSS) for your AKS nodes, ensure that the VMSS is configured correctly and that it's associated with the AKS cluster. The Cluster Autoscaler relies on the VMSS for scaling nodes up and down.
Double-check the configuration of the VMSS, such as the minimum and maximum number of instances, and ensure that it's in a healthy state.
By going through these steps, you should be able to identify and resolve any issues preventing the Cluster Autoscaler from working correctly in your AKS cluster.