AKS node upgrade

Question

AKS node upgrade

Mario 21

Hello,

when upgrading AKS I noticed that the same node gets updated twice, means the previous node with the old k8s version gets in Ready,SchedulingDisabled status and removed once the new node with the new k8s version is ready and then the new k8s node gets in Ready,SchedulingDisabled status again and a new node is scheduled. Once this node is ready the previous one gets deleted and finally the job is done. Is there any particular reason for this?

Thank you
Best,
Mario

Accepted answer

2 additional answers

Your answer

Answer 1

Hello @Mario ,
Trying to clarify your follow-up question(s):

After NodeD gets created , the Pods from NodeA can be re-created on any other nodes (it can be new nodeD or any of the previous un-upgrade Nodes) unless you have any constraints defined for the Pod (like Taint etc).

I was testing locally by deploying Pods in each of the 3 nodes and did an upgrade - this is how it went on:
Initially:
NodeA - PodA
NodeB - PodB
NodeC - PodC

Upgrade Initiated:
NodeD - Created

Observed that PodA created on NodeB
NodeB has got : PodA , PodB

NodeA upgrade Completed.
NodeB Drained - Pods from NodeB created on NodeA , NodeD

NodeB upgrade completed
NodeC upgrade started , Pods from NodeC moved to NodeB

Once NodeC upgrade completed , NodeD drained and Pods from NodeD moved to NodeC.

////////////////

Check out the document:
https://learn.microsoft.com/en-us/azure/aks/upgrade-cluster#upgrade-an-aks-cluster

1) add a new buffer node (or as many nodes as configured in max surge) to the cluster that runs the specified Kubernetes version.
2) cordon and drain one of the old nodes to minimize disruption to running applications (if you're using max surge it will cordon and drain as many nodes at the same time as the number of buffer nodes specified).
3) When the old node is fully drained, it will be reimaged to receive the new version and it will become the buffer node for the following node to be upgraded.
4) This process repeats until all nodes in the cluster have been upgraded.
5) At the end of the process, the last buffer node will be deleted, maintaining the existing agent node count and zone balance.

//Regarding your second follow-up question: why not just creating a new node for every node to upgrade//

There is a setting at nodepool level i.e. max-surge (this can control how many nodes to be upgraded at a time)

Update max surge for an existing node pool with 3 nodes
az aks nodepool update -n mynodepool -g MyResourceGroup --cluster-name MyManagedCluster --max-surge 3

After updating your node pool with above maxsurge setting to 3 , if your node pool has got 3 nodes - at the time of upgrade - It will create 3 additional buffer nodes and all the PODS will get transferred to those 3 buffer nodes etc. After upgrade has completed , those PODS will be re-created back in original nodes.

Kindly let us know if you have additional questions.
Make sure to Upvote & Accept the answers wherever applicable

Regards,
Shiva.

Mario 21 Reputation points

2021-08-28T04:09:12.54+00:00

Hello @shiva patpi ,

I understand the process but my question was slightly different. The process you describe makes use of the new nodes as buffers in order for the old nodes to get upgraded and put back in the pool. That requires the PODS to be re-created twice and here comes my question, why not just creating a new node, transfer the PODS from the old node and then just remove it. It would remove or reduce (in case the PODS are transferred to old nodes) the overhead of re-creating the PODS twice.

Anyway, it doesn't matter, it was just a curiosity to understand why it works like that. Re-creating the PODS twice is kind of annoying, especially when you have applications like a MongoDB sharded cluster when whenever you shutdown the master POD of the shard you trigger a new election for the remaining replicas to elect the master and that could create a short downtime of that shard.

I will accept your answer and thank you for the detailed explanation, really appreciate it.

Best regards,
Mario

Answer 2

Hello @Mario ,
Thanks for your query ! Nodes will get upgraded only once.
Here is how the sequence of upgrades goes through in AKS for nodes

Take an example: Your AKS cluster has got 3 nodes and you have issued the upgrade from X to Y
NodeA - X version
NodeB - X version
NodeC - X version

First it will create a new node with Y version:
NodeA - X version
NodeB - X version
NodeC - X version
NodeD - Y version

It will pick-up NodeA for upgrade , before it starts upgrading it will try to Cordon the node (i.e. Ready,SchedulingDisabled) so that no new PODS gets deployed on that NodeA. It will move the PODS from NodeA to NodeD and then remove that NodeA from the list of kubectl get nodes output.
Once NodeA got upgrade to Y version - you can see that NodeA with Y version under kubectl get nodes output.
Next it will pick-up NodeB - do the same process
Next it will pick-up NodeC - do the same process .
Now , Once the nodeC (Last node) is ready with latest version Y - PODS from NodeD will be moved back to NodeC and then it will try to delete the NodeD (As a part of delete request - It will try to set the node to "Ready, SchedulingDisabled")

So , to summarize the last thing which we observe i.e. setting the NodeD to Ready, SchedulingDisabled is not actually an upgrade - it is trying to remove that particular node but before removing for safe side it is moving that NodeD to Unschedulable."

Let us know if that clarifies your query.

If that answer helps you out , kindly make sure to "Upvote & Accept the answer" so that it will help out to the whole community who are looking for similar queries.

Regards,
Shiva.

Answer 3

Mario 21

Hello @shiva patpi ,

thanks a lot for your response and I think it's clear to me now. I was mislead from the fact that NodeA in your example was taken out from the "get nodes" output and then back again to transfer the PODS back from NodeD to NodeA, I thought was a new node. Just a last question though: in the flow described by you the PODS running on NodeA for instance, they get deleted and created twice, NodeA -> NodeD and then NodeD -> NodeA. Why not just creating a new node for every node to upgrade, transfer the PODS only once to the new node and just delete the old one?

Thank you
Best,
Mario

KarishmaTiwari-MSFT 20,777 Reputation points Microsoft Employee Moderator

2021-08-26T18:32:30.77+00:00

@shiva patpi , Would you be able to help with the follow up question from the customer?

Share via

AKS node upgrade

//Regarding your second follow-up question: why not just creating a new node for every node to upgrade//

2 additional answers

Your answer