Can we leverage "--terminated-pod-gc-threshold" in AKS to delete evicted pods automatically

Vignesh Murugan 126 Reputation points
2021-09-29T06:44:29.92+00:00

Hi all,

We are using AKS cluster(v1.20.7) in our environment. We could see the evicted pods still there in clusters and not get removed automatically. We have gone through some official documents saying this can be done using "--terminated-pod-gc-threshold" flag in API server.

Since the control plane is managed my Microsoft, is it possible to enable this flag.

If not, what could be the best/standard approach for this?

Thanks in advance.

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,855 questions
0 comments No comments
{count} votes

Accepted answer
  1. SRIJIT-BOSE-MSFT 4,326 Reputation points Microsoft Employee
    2021-09-29T18:18:10.583+00:00

    @Vignesh Murugan , Thank you for your question.

    The kube-controller-manager option --terminated-pod-gc-threshold defines the number of terminated pods that can exist before the terminated pod garbage collector starts deleting terminated pods. If <= 0, the terminated pod garbage collector is disabled. Reference.

    As you correctly pointed out, since the control plane of an AKS cluster is managed by Microsoft, at the time of writing, it is not possible to enable this flag.

    There can be different approaches to working around this with varying use cases and varying degrees of complexity. For instance, one might build a custom Kubernetes controller which runs outside the control plane and watches for Events on the API Server and performs certain actions based on the defined logic.


    Or, if most of your Pods which have completed execution and are not removed automatically, are managed by a higher level controller, such as CronJobs, then the Jobs can be cleaned up by CronJobs based on the specified capacity-based cleanup policy.

    TTL mechanism for finished Jobs

    FEATURE STATE: Kubernetes v1.21 [beta]

    Another way to clean up finished Jobs (either Complete or Failed) automatically is to use a TTL mechanism provided by a TTL controller for finished resources, by specifying the .spec.ttlSecondsAfterFinished field of the Job.

    Reference

    When the TTL controller cleans up the Job, it will delete the Job cascadingly, i.e. delete its dependent objects, such as Pods, together with the Job. Note that when the Job is deleted, its lifecycle guarantees, such as finalizers, will be honored.


    Here is yet another approach that might interest you. With this approach we shall:

    • Create a bash script that checks for Pods which have Succeeded or Failed pod.status.phase and deletes them from inside a Kubernetes Pod. Read more on Pod Status here
    • Create a docker image to continuously run this script every one minute
    • Push the image to a container registry
    • Create a Namespace, Service Account, Clusterrole and Clusterrolebinding before we deploy the solution. The Service Account will be granted permission to get,list and delete pods on the cluster scope defined in the Clusterrole through the Clusterrolebinding. We will mount this Service Account on the pods of our Deployment in the next step to enable the pods to access the required Kubernetes APIs.
    • Create a Deployment in the AKS with the aforementioned docker image and using the service account created in the previous step.
    Creating the bash script
    1. Create a fresh directory on your client machine and change the present working directory to this newly created directory:
         mkdir directory-name  
       cd directory-name  
      
    2. Create a pod-gc-script.sh file with the following content:
         APISERVER=https://kubernetes.default.svc  
       # Path to ServiceAccount token  
       SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount  
       # Read this Pod's namespace  
       NAMESPACE=$(cat ${SERVICEACCOUNT}/namespace)  
       # Read the ServiceAccount bearer token  
       TOKEN=$(cat ${SERVICEACCOUNT}/token)  
       # Reference the internal certificate authority (CA)  
       CACERT=${SERVICEACCOUNT}/ca.crt  
       #List the Pods with pod.status.phase in Succeeded or Failed  
       #If you want to add list the pods with more pod.status.phase values please add or .status.phase=="Failed" to the select function of jq  
       curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X GET ${APISERVER}/api/v1/pods | jq '[.items[] | select (.status.phase=="Succeeded" or .status.phase=="Failed") | .metadata | {name,namespace}]' >/test.json  
       #Delete the listed pods from the last step  
       for(( i=0 ; i < $(jq '.|length' /test.json) ; i++ )) ; do curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X DELETE ${APISERVER}/api/v1/namespaces/$(jq ".[$i].namespace" /test.json | sed 's/\"//g')/pods/$(jq ".[$i].name" /test.json | sed 's/\"//g') ; done  
      
    Create a docker image
    1. Create a file named Dockerfile in the same working directory with the following contents:
         FROM centos:7  
      RUN yum install epel-release -y  
      RUN yum update -y  
      RUN yum install jq -y  
      COPY ./pod-gc-script.sh /pod-gc-script.sh  
      RUN chmod +x /pod-gc-script.sh  
      # The command set will iterate every 1 minute. If you want to change the interval please set the sleep command accordingly  
      CMD ["/bin/bash", "-c", "while :; do /pod-gc-script.sh;sleep 60;done"]  
      
    2. Build the docker image using:
       docker build -t <your-registry-server>/<your-repository-name>:<your-tag> .  
      
    Push the image to a container registry
    1. Login to your container registry. [Reference]
    2. Push the docker image to your container registry using:
       docker push <your-registry-server>/<your-repository-name>:<your-tag>  
      
    Create a Namespace, Service Account, Clusterrole and Clusterrolebinding

    In the AKS cluster,

    1. Create a namespace like:
        kubectl create ns pod-gc  
      
    2. Create a Service Account in the namespace like:
        kubectl create sa pod-gc -n pod-gc   
      
    3. Create a Clusterrole like:
        kubectl create clusterrole pod-gc-clusterrole --resource=pods,pods/status --verb=get,list,delete  
      
    4. Create a Clusterrolebinding like:
        kubectl create clusterrolebinding pod-gc-clusterrolebinding --clusterrole pod-gc-clusterrole --serviceaccount pod-gc:pod-gc  
      
    Create Deployment

    Create the deployment on the AKS cluster like:

    cat <<EOF | kubectl apply -f -  
    apiVersion: apps/v1  
    kind: Deployment  
    metadata:  
      labels:  
        app: pod-gc  
      name: pod-gc  
    spec:  
      replicas: 1  
      selector:  
        matchLabels:  
          app: pod-gc  
      template:  
        metadata:  
          labels:  
            app: pod-gc  
        spec:  
          containers:  
          - image: <your-registry-server>/<your-repository-name>:<your-tag>  
            name: pod-gc  
          serviceAccountName: pod-gc  
    EOF  
    

    Hope this helps.

    Please "Accept as Answer" if it helped, so that it can help others in the community looking for help on similar topics.

    2 people found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful