Hello @Colasanto, Francesca ,
Thanks for your query !
Based upon your existing poddisruptionbudgets configuration , aks upgrade failure is expected. The node aks-agentpool-42415862-vmss000004 was not upgraded because the process was not able to move the pod nginx-ingress-ingress-nginx-controller-744847f7b8-kh7bc to another node (Failed to drain the node due to pod PDBs )
You are hitting the issue mentioned at
- https://revolgy.com/blog/kubernetes-in-production-poddisruptionbudget/
(See the section PDB with 1 replica)
Take a look at similar post
Basics of PDB
The affect of PDB
How to configure PDB (Best practices)
- https://kubernetes.io/docs/tasks/run-application/configure-pdb/
- https://learn.microsoft.com/en-us/azure/aks/operator-best-practices-scheduler#plan-for-availability-using-pod-disruption-budgets
(Detailed description mentioned in the above article )
In short , If you see your PDB , it clearly says minimum available should be always 1 (i.e. atleast 1 pod should be available all the time ) . At the upgrade , it will try to drain the node . As a part of draining the node - pods will be moved from one node to another node. Since PDB configuration says minimum available is 1
Basic Rule while defining PDB:-
Have max allowed disruption on PDB less than existing no. of replicas during upgrade
Mitigation1:-
Try deleting the PDB and do the upgrade
kubectl delete pdb nginx-ingress-ingress-nginx-controller
Mitigation2:-
Try to increase the number of replicas of the pod nginx-ingress-ingress-nginx-controller in your deployment yaml file
Mitigation3:-
Try to change the maximum allowed disruptions to less than allowed number of replicas
Hope above explanation helps out in understanding and resolving the issue. If it helps - kindly Upvote and Accept the Answer