AKS node upgrade

Mario 21 Reputation points


when upgrading AKS I noticed that the same node gets updated twice, means the previous node with the old k8s version gets in Ready,SchedulingDisabled status and removed once the new node with the new k8s version is ready and then the new k8s node gets in Ready,SchedulingDisabled status again and a new node is scheduled. Once this node is ready the previous one gets deleted and finally the job is done. Is there any particular reason for this?

Thank you

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,892 questions
0 comments No comments
{count} votes

Accepted answer
  1. shiva patpi 13,146 Reputation points Microsoft Employee

    Hello @Mario ,
    Trying to clarify your follow-up question(s):

    After NodeD gets created , the Pods from NodeA can be re-created on any other nodes (it can be new nodeD or any of the previous un-upgrade Nodes) unless you have any constraints defined for the Pod (like Taint etc).

    I was testing locally by deploying Pods in each of the 3 nodes and did an upgrade - this is how it went on:
    NodeA - PodA
    NodeB - PodB
    NodeC - PodC

    Upgrade Initiated:
    NodeD - Created

    Observed that PodA created on NodeB
    NodeB has got : PodA , PodB

    NodeA upgrade Completed.
    NodeB Drained - Pods from NodeB created on NodeA , NodeD

    NodeB upgrade completed
    NodeC upgrade started , Pods from NodeC moved to NodeB

    Once NodeC upgrade completed , NodeD drained and Pods from NodeD moved to NodeC.


    Check out the document:

    1) add a new buffer node (or as many nodes as configured in max surge) to the cluster that runs the specified Kubernetes version.
    2) cordon and drain one of the old nodes to minimize disruption to running applications (if you're using max surge it will cordon and drain as many nodes at the same time as the number of buffer nodes specified).
    3) When the old node is fully drained, it will be reimaged to receive the new version and it will become the buffer node for the following node to be upgraded.
    4) This process repeats until all nodes in the cluster have been upgraded.
    5) At the end of the process, the last buffer node will be deleted, maintaining the existing agent node count and zone balance.

    //Regarding your second follow-up question: why not just creating a new node for every node to upgrade//

    There is a setting at nodepool level i.e. max-surge (this can control how many nodes to be upgraded at a time)

    Update max surge for an existing node pool with 3 nodes
    az aks nodepool update -n mynodepool -g MyResourceGroup --cluster-name MyManagedCluster --max-surge 3

    After updating your node pool with above maxsurge setting to 3 , if your node pool has got 3 nodes - at the time of upgrade - It will create 3 additional buffer nodes and all the PODS will get transferred to those 3 buffer nodes etc. After upgrade has completed , those PODS will be re-created back in original nodes.

    Kindly let us know if you have additional questions.
    Make sure to Upvote & Accept the answers wherever applicable


2 additional answers

Sort by: Most helpful
  1. shiva patpi 13,146 Reputation points Microsoft Employee

    Hello @Mario ,
    Thanks for your query ! Nodes will get upgraded only once.
    Here is how the sequence of upgrades goes through in AKS for nodes

    Take an example: Your AKS cluster has got 3 nodes and you have issued the upgrade from X to Y
    NodeA - X version
    NodeB - X version
    NodeC - X version

    First it will create a new node with Y version:
    NodeA - X version
    NodeB - X version
    NodeC - X version
    NodeD - Y version

    It will pick-up NodeA for upgrade , before it starts upgrading it will try to Cordon the node (i.e. Ready,SchedulingDisabled) so that no new PODS gets deployed on that NodeA. It will move the PODS from NodeA to NodeD and then remove that NodeA from the list of kubectl get nodes output.
    Once NodeA got upgrade to Y version - you can see that NodeA with Y version under kubectl get nodes output.
    Next it will pick-up NodeB - do the same process
    Next it will pick-up NodeC - do the same process .
    Now , Once the nodeC (Last node) is ready with latest version Y - PODS from NodeD will be moved back to NodeC and then it will try to delete the NodeD (As a part of delete request - It will try to set the node to "Ready, SchedulingDisabled")

    So , to summarize the last thing which we observe i.e. setting the NodeD to Ready, SchedulingDisabled is not actually an upgrade - it is trying to remove that particular node but before removing for safe side it is moving that NodeD to Unschedulable."

    Let us know if that clarifies your query.

    If that answer helps you out , kindly make sure to "Upvote & Accept the answer" so that it will help out to the whole community who are looking for similar queries.


    0 comments No comments

  2. Mario 21 Reputation points

    Hello @shiva patpi ,

    thanks a lot for your response and I think it's clear to me now. I was mislead from the fact that NodeA in your example was taken out from the "get nodes" output and then back again to transfer the PODS back from NodeD to NodeA, I thought was a new node. Just a last question though: in the flow described by you the PODS running on NodeA for instance, they get deleted and created twice, NodeA -> NodeD and then NodeD -> NodeA. Why not just creating a new node for every node to upgrade, transfer the PODS only once to the new node and just delete the old one?

    Thank you