I understand your question regarding upgrades as it an important one for maintaining consistency and avoiding unnecessary reboots or double-draining events in production AKS clusters.
Are new nodes created during a Kubernetes version upgrade running the latest available node OS image, or are they matched to the older image version?
During a Kubernetes version upgrade, AKS creates new buffer nodes (based on the max surge setting) to facilitate a zero-downtime upgrade. These new nodes are typically provisioned with the latest available node OS image for the target Kubernetes version at the time of the upgrade.
https://learn.microsoft.com/en-us/azure/aks/upgrade-aks-cluster?tabs=azure-cli
If the Kubernetes upgrade has already caused a node recreation (or image update), does Azure still perform another OS patch-based node restart in the following OS upgrade window?
If a Kubernetes version upgrade has already recreated nodes with the latest node OS image, Azure’s node OS auto-upgrade process is designed to avoid redundant node reimaging or restarts during the subsequent node OS upgrade window, provided the nodes are already on the latest image version. However, whether a restart occurs depends on the node OS auto-upgrade channel and the specific updates applied.
Yes, AKS has mechanisms to minimize redundant restarts and disruptions, but careful configuration is required to optimize this behavior.
AKS checks the current node image version (nodeImageVersion) against the latest available version during a node OS upgrade. If the nodes are already on the latest image (as is likely after a Kubernetes version upgrade), AKS skips reimaging for the NodeImage channel, avoiding redundant operations. https://learn.microsoft.com/en-us/azure/aks/upgrade-aks-cluster?tabs=azure-cli
AKS allows you to define separate maintenance windows for cluster-level Kubernetes upgrades (aksManagedAutoUpgradeSchedule) and node OS upgrades (aksManagedNodeOSUpgradeSchedule). By scheduling these windows to avoid overlap, you can reduce the risk of back-to-back disruptions. For example, schedule Kubernetes upgrades on a monthly cadence and node OS upgrades on a weekly cadence at different times or days.
Official Documentation and Resources
Perform upgrades in non-production environments first to validate the impact of Kubernetes and OS upgrades. This helps identify potential issues with PDBs, drain timeouts, or application compatibility.
To minimize disruption and avoid overlapping maintenance activities, it's good practice to manage your upgrade windows carefully.
Hope it helps!
Let me know if you have any further queries!
If the information is helpful, please click "upvote" to let us know!