AKS No Nodes in Nodepool

James 1 Reputation point
2022-11-23T15:40:09.753+00:00

Hi Team,

Is there any way to recreate or recover a System nodepool that has no nodes available?

263504-image.png

This happened after I tried updating the service principal for the AKS cluster.

  1. az aks update-credentials --resource-group myResourceGroup --name myAKSCluster--reset-service-principal --service-principal <app_id> --client-secret <password>
  2. After 30 mins or so, noticed that all the pods were in Pending state with the warning "0/1 nodes are available".
  3. I noticed that the node was cordoned (maybe ran out of resources?) so I proceeded to uncordon it with kubectl uncordon myNode
  4. In the Azure portal, I noticed the system nodepool was stuck in the provisioning state "RefreshingServicePrincipalProfile" for a couple of hours before finally becoming "Failed".
  5. I attempted to Start and Stop the AKS cluster in the Azure portal however the nodepool still remains "Failed" and there are still no nodes in the system nodepool

What steps can I take to fix the system nodepool? I've tried adding another system nodepool but it failed because the cluster is using an old Kubernetes version (1.16) which is no longer supported. It also won't let me upgrade the Kubernetes version of the cluster because it's currently in a "Failed" state.

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,987 questions
{count} votes

2 answers

Sort by: Most helpful
  1. shiva patpi 13,171 Reputation points Microsoft Employee
    2022-11-24T05:54:27.053+00:00

    @James ,
    Can you try running below command (It will try to reconcile the cluster to last known good state)
    az resource update -n aksclustername -g aksresourcegroup --namespace Microsoft.ContainerService --resource-type ManagedClusters

    Once you run that command, cluster will go to updating state - wait for couple of mins to see if it can turn to healthy state.

    FYI - To see why the cluster was stuck in RefreshServicePrinciplaProfile , try running kubectl get events --all-namespaces .


  2. Tarik TOUROUGUI 1 Reputation point
    2022-12-08T13:34:27.697+00:00

    Hello
    i am having currently the same issue. Did a cluster restart solved the problem?
    I am still unable to access to the kube-apiserver via the clusterIP service. Istio is trying to use the api to read/write the secret so it can update the proxies with a new certificate.

    0 comments No comments