Get rid of AKS pods stuck in terminating state

Reshma Nair 120 Reputation points
2024-05-21T08:27:11.6066667+00:00

I have an AKS cluster set up with Terraform and Kubernetes deployments managed by Helm charts. During a Helm update, new pods are provisioned, but the old ones enter a terminating state and get stuck there as shown below.

User's image

It does not affect the functioning of my cluster, but having numerous pods stuck in a terminating state is not development friendly.

I am aware that it can be deleted manually with the command provided, but I require an automatic method to do so.

kubectl delete pod --grace-period=0 --force

I tried the following ways to find why the pods are entering to a terminating state but no luck.

  • No errors found in my deployment helm
  • Checked pod logs and nothing specific than below
    state:
    
      terminated:
    
        containerID: containerd://bd887f75ac7aa7d6173fb25dc58eae6bd486daf87a0d305579da99d0a807789f
    
        exitCode: -1073741510
    
        finishedAt: "2024-05-21T07:36:00Z"
    
        reason: Error
    
        startedAt: "2024-05-13T13:51:32Z"
    
    hostIP: phase: Failed podIP: podIPs:
    • ip:
    qosClass: BestEffort startTime: "2024-05-13T13:51:09Z"
  • No finalizers found
  • Set pod.Spec.TerminationGracePeriodSeconds to 60 s

I am not sure whether the issue lies with the AKS cluster creation, the deployment helm, or if there's an alternative method to automatically remove pods in a terminating state.

Any assistance would be greatly appreciated. Thanks in advance.

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,929 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Goncalo Correia 256 Reputation points Microsoft Employee
    2024-05-21T10:53:08.1133333+00:00

    Hi Reshma,

    Thanks for posting your question here!

    This could be happening for multiple reasons, either way the automatical removal of those pods/containers is/should be made by the Garbage Collector: https://kubernetes.io/docs/concepts/architecture/garbage-collection/

    You can review the kubelet logs on the nodes where those pods were running and try to figure out why the containers are not being removed or not found.

    You might also need to follow-up with the logs from the container runtime.

    Hope this helps!

    Best Regards

    1 person found this answer helpful.