Failed to upgrade kubernetes service due to ip address issues

Tanul 1,251 Reputation points
2022-09-21T04:49:40.067+00:00

Team,

We have VMAS type AKS with 1 node pool having 7 working nodes and 1 spare which helps us at the time of upgrade. Networking type is Azure CNI. Currently, we are on 1.23.5 version and trying to upgrade(Control plane+all node pools.. VMAS allows only this option) to version 1.23.8 but getting this error regularly:

image

Per node, max 30 pod is set and in total we have 102 pods in our AKS environment. The VNET cidr is /20 and subnet cidr is /24.

We have just reduced the replica value to 0 in 10 deployments which brings the pod count to 92 but still the available ip's in the subnet is same as it is before reducing the replica count:

image

Ideally we should have 30x8 +3= 243 ips but AKS is not allowing us to upgrade with just 92 pod count

We have also tried to scale down but AKS is not even allowing us to increase/decrease the node count at all
243376-image.png
Kindly suggest some resolution as it is impacting our environment.

Regards,
Tanul

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,999 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. vipullag-MSFT 26,021 Reputation points
    2022-09-27T06:30:08.427+00:00

    @Tanul

    Welcome to Microsoft Q&A Platform, thanks for posting your query here.
    Firstly, apologies for the delay in responding here and any inconvenience this issue may have caused.

    AKS has a mechanism called "surge nodes" which is the "spare" nodes that AKS automatically provision during upgrade. In this scenario when upgrade is triggered, AKS scales up by 1 for a surge node to balance the pod workload, which causes problem.

    How this is used and how CNI IP address need to be calculated is documented here. The surge node mechanism is documented here.

    Hope this helps.
    If you need further help on this, tag me in a comment.
    If the suggested response helped you resolve your issue, please 'Accept as answer', so that it can help others in the community looking for help on similar topics.