Share via

AKS Node Image Upgrade Failures

Vinicius Barbosa 0 Reputation points
2026-05-11T16:52:14.6366667+00:00

On May 10th, the scheduled node image upgrades failed across both lower environment and production Kubernetes clusters. The activity logs show only the following message: “Upgrade Failed with status Unspecified, error: Unknown error”
User's image

Could you help identifying the root cause of these upgrade failures? I'm worried if the the weekly security and bug fixes are being persisted or not.
User's image

Thank you,

Vinícius Barbosa

Azure Kubernetes Service
Azure Kubernetes Service

An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.


2 answers

Sort by: Most helpful
  1. Mutaz Nassar 2,361 Reputation points Microsoft Employee
    2026-05-19T15:01:05.9366667+00:00

    Hi @Vinicius Barbosa

    This is known issue which trigger a False-positive “upgrade failed” activity log entries for AKS nodepool upgrades triggered by manual or autoupgrader operation, where the underlying upgrade ultimately succeeds, but activity log shows failure.

    Azure team is aware about this issue and fix has been deployed to improve status handling and reduce “Unspecified” results.

    Was this answer helpful?


  2. Ajas Saif 85 Reputation points
    2026-05-11T16:55:30.04+00:00

    At this time, the Activity Logs only show a generic backend error:

    “Upgrade Failed with status Unspecified, error: Unknown error”

    The failures are tied to the automated Create or Update Agent Pool operations initiated by Microsoft.ContainerService during the scheduled weekly node image upgrade process.

    Based on the current findings:

    • The Kubernetes control plane itself was not impacted and remains healthy on version 1.35.0
    • These failures indicate that the node image upgrade process did not complete successfully for the affected node pools
    • Since the node image upgrade failed, the latest weekly node OS/security image updates were most likely not applied to those nodes during this maintenance cycle
    • Existing nodes continue running on their previous validated node image, so workloads should remain operational, but the latest security and bug-fix patches may not yet be present

    Possible causes for this type of AKS node image upgrade failure commonly include:

    • Temporary Azure backend/platform issues
    • Node drain failures caused by workloads or Pod Disruption Budgets (PDBs)
    • Insufficient surge capacity during upgrade
    • VM allocation/SKU availability constraints in the region
    • Transient agent pool reconciliation failures

    Recommended next steps:

    1. Retry the node image upgrade manually from the AKS Upgrade/Node Pool section or via CLI
    2. Review node pool events and upgrade operations in Azure Monitor / Activity Logs
    3. Verify if any strict PDBs or workload constraints prevented node draining

    Confirm current node image versions using:

    
    

    or

    
    
    1. If the issue persists, raise a Microsoft support case with correlation IDs from the failed operations for backend investigation

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.