I have a pod status of Pending with a reason of FailedScheduling when upgrading AKS from 1.28 to 1.29.

Newsom Keaton, Viki 20 Reputation points
2024-11-12T18:17:30.76+00:00

I upgraded our AKS cluster from 1.28 to 1.29. All 123 pods are Running except 1. It's in a Pending state.
It is a StatefulSet so I have deleted it a few times and it will not come up successfully.

The reason in the pod description gives:
Warning FailedScheduling 17m (x126 over 10h) default-scheduler 0/13 nodes are available: 13 Insufficient cpu. preemption: 0/13 nodes are available: 13 No preemption victims found for incoming pod.

The node resources for all 13 nodes are:
CPU: max is 6% used out of 3860m CPU
Memory: max is 32% used out of 14.9GB allocated
Disk: max is 23% used out of 111.5 GB
Pods: range is 11-15 for each node (max is set to 30)

The pod resources:requests:cpu: is '3'. Memory is 6Gi

All 13 nodes were upgraded to 1.29 successfully via the azure cli with ProvisioningState as "Succeeded".

Upgrade settings: max surge is 33%

drainTimeout is 30, no nodeSoakDuration parameter.

Again, all other pods are successful but this 1 out of 123 pods.

Why does it think there is insufficient cpu? Why only this pod? Thank you in advance for any suggestions or ideas-

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,193 questions
{count} votes

Accepted answer
  1. Mahesh Goud Juvvadi 1,565 Reputation points Microsoft Vendor
    2024-11-12T22:36:29.9366667+00:00

    Hi Newsom Keaton, Viki,

    Thank you for reaching out to the Microsoft Q&A platform.

    The error message indicates that the scheduler cannot find any nodes with enough available CPU.

    Please check that the resource requests and limits for your pod are correctly defined. Check that no discrepancies in the configuration.

    Mentioned that your nodes have sufficient CPU resources, please check the allocatable resources on the nodes. You can do this by running the following command:

    kubectl describe nodes
    
    

    If you have resource quotas set at the namespace level, ensure that your StatefulSet pod is not exceeding those quotas. You can check for resource quotas with:

    kubectl get resourcequota -n <your-namespace>
    
    

    Check that none of the nodes are in a NotReady state or have any conditions that might prevent scheduling. You can check the node status with:

    kubectl get nodes
    
    

    If the allocatable resources are indeed insufficient, you may need to consider scaling your cluster by adding more nodes or increasing the size of existing nodes to provide additional CPU resources.

    Please find the below document for your reference : Cluster is in a failed state

    If it was helpful, please click "Upvote" on this post to let us know.

    Thank You.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.