AKS cluster stuck in failed state

David Wilson 0 Reputation points
2024-07-06T10:08:03.5733333+00:00

The cluser is in a a failed state and I am unable to update, upgrade or stop it.

The errorCode is ResourceOperationFailure

when running kubectl get nodes and kubectl get apiservices - 

E0706 10:06:16.489936   30640 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request

E0706 10:06:16.828243   30640 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request

E0706 10:06:17.058699   30640 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request

E0706 10:06:17.142195   30640 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request

No resources found

v1beta1.metrics.k8s.io                 kube-system/metrics-server   False (MissingEndpoints)   4y5d

the above seems to be the issue, but I'm not able to do anything whilst it's in a failed state, how do I resolve this?

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,000 questions
{count} votes

1 answer

Sort by: Most helpful
  1. AlaaBarqawi_MSFT 942 Reputation points Microsoft Employee
    2024-07-08T07:01:27.26+00:00

    Hi @David Wilson ,

    1-can you check diagnose and solve problems from Azure portal ?

    and the click on CRUD operations ?

    2-Are you using NSG/FireWall/ Custom DNS or Routetable?

    https://learn.microsoft.com/en-us/azure/aks/outbound-rules-control-egress

    User's image

    3-try to reconcile the cluster by running

    az aks update -g MyResourceGroup -n MyManagedCluster

    0 comments No comments