Aks cluster shows in failed state

Suvarnnan, N V 20 Reputation points
2023-07-01T10:12:38.7433333+00:00

Our cluster shows that it is in failed state ,but all pods are running and functioning properly.

on analysis , cluster "provisioningState": seems to be "Failed",

Corrections steps suggested by azure diagonostics are not helping

Tried below link steps ,but not helping

https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/cluster-node-virtual-machine-failed-state

Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,456 questions
0 comments No comments
{count} votes

Accepted answer
  1. Mutaz Nassar 2,361 Reputation points Microsoft Employee
    2023-07-03T06:55:16.8133333+00:00

    HI Suvarnnan, N V,
    It seems there is a defender profile and it is looking for the same workspace, to get this issue resolved you can recreate the same workspace as it is used by monitoring addon and defender profile then update the AKS cluster, using these commands:

    az group create --name DefaultResourceGroup-WEU --location westeurope
     
    az monitor log-analytics workspace create --resource-group DefaultResourceGroup-WEU --workspace-name DefaultWorkspace-a82bc17b-cc8c-4d08-b6bb-1bf427e82ceb-WEU  
    
    az aks update --name <aks-name> --resource-group <aks-resourcegroup>
    

    Hope this helps, and please "Accept as Answer" if it helped, so that it can help others in the community looking for help on similar topics

    2 people found this answer helpful.
    0 comments No comments

8 additional answers

Sort by: Most helpful
  1. Sedat SALMAN 14,180 Reputation points MVP
    2023-07-02T05:38:16.9733333+00:00

    there is a similar post below

    https://learn.microsoft.com/en-us/answers/questions/893384/azure-aks-cluster-is-in-failed-state

    You can see a step-by-step approach to solving your problem

    0 comments No comments

  2. Suvarnnan, N V 20 Reputation points
    2023-07-02T05:59:12.02+00:00

    not able to start /stop /upgrade in azure portal even though aks is running fine.

    All these are disabled as cluster is in failed state .

    Internally cluster is working fine ,but this failed state blocks all update operations

    0 comments No comments

  3. Mutaz Nassar 2,361 Reputation points Microsoft Employee
    2023-07-02T08:08:20.63+00:00

    @Suvarnnan, N V

    If you followed all steps in the motioned link and nothing worked so can you run the update command:
    az resource update --ids <aks-resource-id>

    And while it is running you can check the cluster events which could show what's blocking the nodes to be drained during the update ( i.e. pod disruption budget)
    kubectl get events

    If the events didn't help then you can check the Activity log from Azure portal on both AKS cluster level and on the VMSS level which could show the error.

    Also, you can use diagnostic and solve problems feature on Azure portal for troubleshooting your AKS cluster that will show more insights.

    0 comments No comments

  4. Suvarnnan, N V 20 Reputation points
    2023-07-02T08:27:12.55+00:00

    I performed diagnostics to identify and resolve issues, and it provided me with specific commands tailored to your cluster details.

    However, when I executed the provided commands, it indicated that your cluster was already up to date.

    So still in same state.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.