AKS Nodes suddenly stopped and now stuck in Not Ready state

BATUHAN GURHAN 1 Reputation point
2022-03-19T15:21:44.413+00:00

Hello,

I have a virtual machine scale set configured for AKS cluster, that has no autoscaling enabled (just 2 instances). All of a sudden, 5 days ago, I have noticed the nodes have switched to Not Ready state, even though I have not applied any changes, configuration updates, etc.

I have already tried upgrading nodes and updating images, as well as restarting both AKS service and virtual machine scale set, but no luck. It has been 5 days since it has been down and I am not able to identify the cause.

When I run kubectl describe nodes, I get the following:

Normal Starting 19s kubelet Starting kubelet.
Warning InvalidDiskCapacity 13s kubelet invalid capacity 0 on image filesystem
Normal Starting 13s kubelet Starting kubelet.
Warning InvalidDiskCapacity 7s kubelet invalid capacity 0 on image filesystem
Normal Starting 7s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 7s kubelet Node aks-nodepool1-20474252-vmss000009 status is now: NodeHasSufficientMemory
Warning InvalidDiskCapacity 2s kubelet invalid capacity 0 on image filesystem
Normal Starting 2s kubelet Starting kubelet.

Any help will be appreciated.

Thanks!

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,898 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Manu Philip 17,006 Reputation points MVP
    2022-03-19T18:14:23.287+00:00

    As the error explains, the pods are not able to deploy due to a disk space crunch.
    So, try to increase the storage availability in nodes usable for the deployment.
    When deploying a pod, deployment task checks the resource requests. In this case, it's failing on disk requests. If there are multiple pods to be deployed, the deployment fails at the point, when it found resource requirement is not met. In this case, you may see first couple of pods are being deployed and stops when finding a resource crunch.

    Another way of limiting the usage id by 'limits' while creating the pods as explaining here: limit-storage-consumption


    --please don't forget to upvote and Accept as answer if the reply is helpful--

    1 person found this answer helpful.
    0 comments No comments