A nodepool upgrade will cause downtime for your AKS cluster

DisplayName42 56 Reputation points
2023-02-20T12:54:23.73+00:00

The documentation of "Update an AKS cluster to use a managed identity" has the following warning:

A nodepool upgrade will cause downtime for your AKS cluster as the nodes in the nodepools will be cordoned/drained and then reimaged.

However the documentation in "Upgrade an AKS cluster" states that it will

Cordon and drain one of the old nodes to minimize disruption to running applications.

Does updating an AKS cluster to use managed identity cause downtime or will it carefully cordon and drain the nodes one by one? What does "downtime" mean in the first quote? Does it mean that the cluster will be completely offline? If yes, how can I estimate the duration of the downtime?

Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,456 questions
{count} votes

Accepted answer
  1. Adrian Dobrescu 266 Reputation points Microsoft Employee
    2023-02-20T13:25:49.7466667+00:00

    Good day,

    Thank you for reaching us!

    As its is stated in the documentation, an upgrade process consist of the following:

    • Add a new buffer node (or as many nodes as configured in max surge) to the cluster that runs the specified Kubernetes version.
    • Cordon and drain one of the old nodes to minimize disruption to running applications. If you're using max surge, it will cordon and drain as many nodes at the same time as the number of buffer nodes specified.
    • When the old node is fully drained, it will be reimaged to receive the new version, and it will become the buffer node for the following node to be upgraded.
    • This process repeats until all nodes in the cluster have been upgraded.
    • At the end of the process, the last buffer node will be deleted, maintaining the existing agent node count and zone balance.
    • You can also customize a node surge upgrade depending on the requirements/need you might have: a faster upgrade with a downtime(maybe for testing environments) or a 33% max surge recommended for production environments.
    • If you stick with the recommended option for production, there won't be a noticeable downtime for your applications.
      You can refer to this document as well for more information and examples:

    https://learn.microsoft.com/en-us/azure/aks/upgrade-cluster?tabs=azure-cli#customize-node-surge-upgrade

    Please let us know if you have any further questions and we will be glad to assist you further. Thank you!

    Please "Accept as Answer" and Upvote if it helped, so that it can help others in the community looking for help on similar topics.

    3 people found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Ammar-Abdlqader 1,176 Reputation points Microsoft Employee
    2023-02-20T13:21:17.29+00:00

    hello @DisplayName42

    Thank you for your question, once you update your aks cluster to use MSI, it will update the node, with the new Client ID, which means it will re-image the nodes one by one until all the nodes will be re-imaged to use the new MSI.

    if your application use deployment with 3 replicas, that will not downtime your applications, as the first node will be re-imaged, once it has been done it will re-image the second node with the new changes.

    Upgrading a node pool in Azure Kubernetes Service (AKS) can cause downtime for your cluster. According to the documentation, during the upgrade process, AKS will cordon and drain one of the old nodes to minimize disruption to running applications. When the old node is fully drained, it will be reimaged to receive the new version, and it will become the buffer node for the following node to be upgraded. This process repeats until all nodes in the cluster have been upgraded. At the end of the process, the last buffer node will be deleted, maintaining the existing agent node count and zone balance.

    Please "Accept as Answer" and Upvote if it helped, so that it can help others in the community looking for help on similar topics.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.