Share via

AKS cluster stuck due to long-running operation not completing. az aks update hangs, nodepool stuck in Updating/Cancelled, VMSS looping.

2026-03-22T21:21:36.5633333+00:00

AKS cluster stuck due to long-running operation not completing. az aks update hangs, nodepool stuck in Updating/Cancelled, VMSS looping. How to solve this ?.

This is what has been suggested... "Request backend LRO cleanup and control plane reconciliation.". Any insights / steps to achieve this would be helpful.

Azure Kubernetes Service
Azure Kubernetes Service

An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.

0 comments No comments

2 answers

Sort by: Most helpful
  1. SUNOJ KUMAR YELURU 18,171 Reputation points MVP Volunteer Moderator
    2026-03-23T09:27:19.2133333+00:00

    Hello @Technical Administration (Bee-Relevant),

    When an Azure Kubernetes Service (AKS) cluster is stuck due to a long-running operation not completing, it can lead to various issues such as the az aks update command hanging, node pools being stuck in Updating or Cancelled states, and Virtual Machine Scale Sets (VMSS) looping. This situation often arises when a long-running operation is either stuck or failing, which can be aborted if it is the last running operation on the managed cluster or agent pool.

    To address this, you can use the Azure CLI to abort the operation. For example, you can run the command:

    Azure CLI

    az aks operation-abort \
        --name myAKSCluster \
        --resource-group myResourceGroup
    

    This command will terminate the operation and return an HTTP status code of 204 if successful.

    If the node pool is in a failed state, it may be due to issues such as insufficient capacity, quota limits, or network issues. To troubleshoot, you can check the provisioning state of the node pool using the command az aks nodepool show and look for any error messages.

    Additionally, reviewing the activity log and diagnostic settings can help identify the cause of the failure.


    If this answers your query, do click Accept Answer and Up-Vote for the same. And, if you have any further query do let us know.

    0 comments No comments

  2. Manish Deshpande 5,420 Reputation points Microsoft External Staff Moderator
    2026-03-23T07:33:05.6166667+00:00

    Hello

    We understand how impactful it can be when an AKS cluster becomes blocked due to a long‑running operation, preventing further updates or node pool actions. To address this, Azure Kubernetes Service provides a supported and reliable mechanism to safely unblock the cluster by aborting the last active long‑running operation. This allows the control plane to reconcile its state and restores your ability to perform subsequent cluster or node pool operations.

    By explicitly aborting the stuck operation at the cluster or node pool level, you can regain control without requiring redeployment.

    AKS provides a supported mechanism to abort long‑running operations and release control of the cluster or node pool. This allows the control plane to reconcile state and accept new operations.

    Check the latest operation status

    az aks operation show-latest \
      --resource-group <resource-group> \
      --name <aks-cluster-name>
    

    This command returns the operation ID, status, and percent Complete, helping confirm that the operation is still in progress or stuck.

    Abort the stuck operation (Node Pool level)

    If the issue is isolated to a node pool (common when VMSS is looping), abort the node pool operation:

    az aks nodepool operation-abort \
      --resource-group <resource-group> \
      --cluster-name <aks-cluster-name> \
      --name <nodepool-name>
    
    
    

    Abort the stuck operation (Cluster level)

    If the entire cluster update is blocked (for example, az aks update is hanging), abort the cluster‑level operation:

    az aks operation-abort \
      --name <aks-cluster-name> \
      --resource-group <resource-group>
    
    

    A successful request returns HTTP 204, indicating the abort has been accepted.

    This approach is fully supported and documented by Microsoft and is the recommended resolution for scenarios where operations remain in Updating or Canceled states, or when VMSS provisioning loops persist.

    Link
    https://docs.azure.cn/en-us/aks/manage-abort-operations?tabs=azure-cli

    If the issue persists after following the above guidance, please do not hesitate to let us know. We are more than happy to assist further and review the cluster state in detail. Kindly add your observations or questions in the Comment section, and we will respond promptly to support you.

    Thanks,
    Manish.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.