AKS Node Image Upgrade Failures

Question

AKS Node Image Upgrade Failures

Vinicius Barbosa 0

On May 10th, the scheduled node image upgrades failed across both lower environment and production Kubernetes clusters. The activity logs show only the following message: “Upgrade Failed with status Unspecified, error: Unknown error”
User's image

Could you help identifying the root cause of these upgrade failures? I'm worried if the the weekly security and bug fixes are being persisted or not.
User's image

Thank you,

Vinícius Barbosa

Manish Deshpande 6,340 Reputation points Microsoft External Staff Moderator

2026-05-11T17:15:38.75+00:00

Thanks for reaching out to Microsoft Q&A.

We are checking this issue with the PG team and will update you shortly once we have an update from the PG team.

Thanks.

2 answers

Your answer

Manish Deshpande 6,340 Reputation points Microsoft External Staff Moderator

2026-05-11T17:15:38.75+00:00

Thanks for reaching out to Microsoft Q&A.

We are checking this issue with the PG team and will update you shortly once we have an update from the PG team.

Thanks.

Answer 1

Mutaz Nassar 2,361 Microsoft Employee

Hi @Vinicius Barbosa

This is known issue which trigger a False-positive “upgrade failed” activity log entries for AKS nodepool upgrades triggered by manual or autoupgrader operation, where the underlying upgrade ultimately succeeds, but activity log shows failure.

Azure team is aware about this issue and fix has been deployed to improve status handling and reduce “Unspecified” results.

Vinicius Barbosa 0 Reputation points

2026-05-19T15:47:48.19+00:00

Hi @Mutaz Nassar . Thank you for the information. Do you know if this is documented somewhere? Also, do you know if there is a way to verify if the update really worked on those cases? You mentioned that a fix has been deployed, does it mean that we can expect a reduced occurrence of those false-positives?
Mutaz Nassar 2,361 Reputation points Microsoft Employee

2026-05-20T10:00:49.8933333+00:00

Hi @Vinicius Barbosa This is not publicly documented. I am part of the Microsoft AKS team, and this information was shared by the relevant team. You can validate the node image version and node pool provisioning state to confirm the upgrade outcome.

Answer 2

At this time, the Activity Logs only show a generic backend error:

“Upgrade Failed with status Unspecified, error: Unknown error”

The failures are tied to the automated Create or Update Agent Pool operations initiated by Microsoft.ContainerService during the scheduled weekly node image upgrade process.

Based on the current findings:

The Kubernetes control plane itself was not impacted and remains healthy on version 1.35.0
These failures indicate that the node image upgrade process did not complete successfully for the affected node pools
Since the node image upgrade failed, the latest weekly node OS/security image updates were most likely not applied to those nodes during this maintenance cycle
Existing nodes continue running on their previous validated node image, so workloads should remain operational, but the latest security and bug-fix patches may not yet be present

Possible causes for this type of AKS node image upgrade failure commonly include:

Temporary Azure backend/platform issues
Node drain failures caused by workloads or Pod Disruption Budgets (PDBs)
Insufficient surge capacity during upgrade
VM allocation/SKU availability constraints in the region
Transient agent pool reconciliation failures

Recommended next steps:

Retry the node image upgrade manually from the AKS Upgrade/Node Pool section or via CLI
Review node pool events and upgrade operations in Azure Monitor / Activity Logs
Verify if any strict PDBs or workload constraints prevented node draining

Confirm current node image versions using:

or

If the issue persists, raise a Microsoft support case with correlation IDs from the failed operations for backend investigation

Share via

AKS Node Image Upgrade Failures

2 answers

Your answer