I have a pod status of Pending with a reason of FailedScheduling when upgrading AKS from 1.28 to 1.29.

Question

I have a pod status of Pending with a reason of FailedScheduling when upgrading AKS from 1.28 to 1.29.

Newsom Keaton, Viki 20

I upgraded our AKS cluster from 1.28 to 1.29. All 123 pods are Running except 1. It's in a Pending state.
It is a StatefulSet so I have deleted it a few times and it will not come up successfully.

The reason in the pod description gives:
Warning FailedScheduling 17m (x126 over 10h) default-scheduler 0/13 nodes are available: 13 Insufficient cpu. preemption: 0/13 nodes are available: 13 No preemption victims found for incoming pod.

The node resources for all 13 nodes are:
CPU: max is 6% used out of 3860m CPU
Memory: max is 32% used out of 14.9GB allocated
Disk: max is 23% used out of 111.5 GB
Pods: range is 11-15 for each node (max is set to 30)

The pod resources:requests:cpu: is '3'. Memory is 6Gi

All 13 nodes were upgraded to 1.29 successfully via the azure cli with ProvisioningState as "Succeeded".

Upgrade settings: max surge is 33%

drainTimeout is 30, no nodeSoakDuration parameter.

Again, all other pods are successful but this 1 out of 123 pods.

Why does it think there is insufficient cpu? Why only this pod? Thank you in advance for any suggestions or ideas-

Newsom Keaton, Viki 20 Reputation points

2024-11-12T22:47:05.19+00:00

Thank you, Mahesh - It looks like the allocatable resources are indeed insufficient. Thanks for the kubectl commands, information, and link.

Accepted answer

0 additional answers

Your answer

Newsom Keaton, Viki 20 Reputation points

2024-11-12T22:47:05.19+00:00

Thank you, Mahesh - It looks like the allocatable resources are indeed insufficient. Thanks for the kubectl commands, information, and link.

Answer 1

Hi Newsom Keaton, Viki,

Thank you for reaching out to the Microsoft Q&A platform.

The error message indicates that the scheduler cannot find any nodes with enough available CPU.

Please check that the resource requests and limits for your pod are correctly defined. Check that no discrepancies in the configuration.

Mentioned that your nodes have sufficient CPU resources, please check the allocatable resources on the nodes. You can do this by running the following command:

kubectl describe nodes

If you have resource quotas set at the namespace level, ensure that your StatefulSet pod is not exceeding those quotas. You can check for resource quotas with:

kubectl get resourcequota -n <your-namespace>

Check that none of the nodes are in a NotReady state or have any conditions that might prevent scheduling. You can check the node status with:

kubectl get nodes

If the allocatable resources are indeed insufficient, you may need to consider scaling your cluster by adding more nodes or increasing the size of existing nodes to provide additional CPU resources.

Please find the below document for your reference : Cluster is in a failed state

If it was helpful, please click "Upvote" on this post to let us know.

Thank You.

Anonymous

2024-11-12T23:13:14.7833333+00:00

Hello Newsom Keaton, Viki,

Thank you for taking time and responding back to me.

If an answer has been helpful, please consider accept the answer and "Upvote" to help increase visibility of this question for other members of the Microsoft Q&A community.

Share via

I have a pod status of Pending with a reason of FailedScheduling when upgrading AKS from 1.28 to 1.29.

0 additional answers

Your answer