Production AKS cluster is stuck in Failed provisioningState. Cluster is running unsupported Kubernetes version 1.22.x. All nodepool operations (add/delete/scale) fail with InvalidGalleryImageRef and InternalServerError. Control plane upgrade is blocked du

ITOPS LLM 0 Reputation points
2026-01-27T06:24:57.1666667+00:00

Production AKS cluster is stuck in Failed provisioningState.

Cluster is running unsupported Kubernetes version 1.22.x.

All nodepool operations (add/delete/scale) fail with InvalidGalleryImageRef and InternalServerError.

Control plane upgrade is blocked due to failed state.

Request backend repair and forced upgrade to supported version (1.32.x).

Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
{count} votes

1 answer

Sort by: Most helpful
  1. Q&A Assist
    2026-01-27T06:25:38.3+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    To address the issue of your AKS cluster being stuck in a Failed provisioning state, you can follow these troubleshooting steps:

    1. Check the Activity Log: Use the Azure portal or Azure CLI to view the activity log for your cluster. This log can provide insights into the errors causing the failure. Look specifically for error codes and messages related to the InvalidGalleryImageRef and InternalServerError.
    2. Diagnose Node Pool Issues: Since all node pool operations are failing, check the status of the node pool using the az aks nodepool show command. If the provisioning state is Failed, the output will include error messages that can help identify the problem.
    3. Upgrade Control Plane: Since your cluster is running an unsupported Kubernetes version (1.22.x), you may need to force an upgrade to a supported version (1.32.x). This can sometimes resolve provisioning issues. However, ensure that you first resolve any existing errors that are blocking the upgrade.
    4. Repair Backend: If the cluster remains in a failed state, consider requesting a backend repair through Azure support. This can help address underlying issues that are not resolvable through standard troubleshooting steps.
    5. Review Quotas and Policies: Ensure that your subscription has sufficient quota and that there are no policies or resource locks preventing the operations you are trying to perform.

    By following these steps, you should be able to identify and resolve the issues causing your AKS cluster to be in a failed state.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.