Automatically Taint and Label the node during starting of AKS cluster

Shreyas Arani 266 Reputation points
2021-09-30T03:57:02.777+00:00

Hi we have a 4 node AKS cluster, we are daily starting and stopping the AKS cluster as we are having budget constraint. we are using taint and tolerations to deploy specific pods on the tainted node. Also we are labelling the node. Our requirement is to automatically taint and label any one node out of the 4 nodes. So that we needn't to manually taint and label the node after the cluster has started. Can anyone help on how this can be achieved?

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,180 questions
0 comments No comments
{count} votes

Accepted answer
  1. SRIJIT-BOSE-MSFT 4,336 Reputation points Microsoft Employee
    2021-09-30T06:51:23.707+00:00

    @Shreyas Arani , Thank you for the question.

    Here is one automation approach that might interest you. With this approach we shall:

    • Create a bash script that taints and labels the last node in the list of available nodes from inside a Kubernetes Pod.
    • Create a docker image to run this script.
    • Push the image to a container registry
    • Create a Namespace, Service Account, Clusterrole and Clusterrolebinding before we deploy the solution. The Service Account will be granted permission to get,list and delete pods on the cluster scope defined in the Clusterrole through the Clusterrolebinding. We will mount this Service Account on the pods of our Deployment in the next step to enable the pods to access the required Kubernetes APIs.
    • Create a Deployment in the AKS with the aforementioned docker image and using the Service Account created in the previous step.

    [Note: I will creating a Deployment which will eventually have the Pod in Succeeded (Completed) State, because I can count on my Deployment's desired state to be persisted when the cluster is stopped and the kube-controller-manager should scale a replica of the Pod every time the cluster starts. You might choose to use Jobs or CronJobs, depending on your needs]

    Creating the bash script
    1. Create a fresh directory on your client machine and change the present working directory to this newly created directory:
        mkdir directory-name  
      cd directory-name  
      
    2. Create a node-taint-script.sh file with the following content:
        APISERVER=https://kubernetes.default.svc  
      # Path to ServiceAccount token  
      SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount  
      # Read this Pod's namespace  
      NAMESPACE=$(cat ${SERVICEACCOUNT}/namespace)  
      # Read the ServiceAccount bearer token  
      TOKEN=$(cat ${SERVICEACCOUNT}/token)  
      # Reference the internal certificate authority (CA)  
      CACERT=${SERVICEACCOUNT}/ca.crt  
      #Get the number of nodes  
      n=$(curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X GET ${APISERVER}/api/v1/nodes/ | jq '.items | length' )  
      #Get the last node  
      nodename=$(curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X GET ${APISERVER}/api/v1/nodes/ | jq ".items[$((n-1))].metadata.name" | sed 's/\"//g')  
      #Taint the node  
      curl --cacert ${CACERT} -g -d '{"spec":{"taints":[{"effect":"<TAINT_EFFECT>","key":"<TAINT_KEY>","value":"<TAINT_VALUE>"}]}}'  -H "Accept: application/json, */*" -H "Content-Type: application/strategic-merge-patch+json" --header "Authorization: Bearer ${TOKEN}" -X PATCH ${APISERVER}/api/v1/nodes/$nodename?fieldManager=kubectl-taint  
      #Label the node  
      curl --cacert ${CACERT} -g -d '{"metadata":{"labels":{"<LABEL_KEY>":"<LABEL_VALUE>"}}}'  -H "Accept: application/json, */*" -H "Content-Type: application/merge-patch+json" --header "Authorization: Bearer ${TOKEN}" -X PATCH ${APISERVER}/api/v1/nodes/$nodename?fieldManager=kubectl-label  
      
      Please replace <TAINT_EFFECT>, <TAINT_KEY>, <TAINT_VALUE> with your taint effect, key and value respectively and <LABEL_KEY> and <LABEL_VALUE> with the label key and value.
      Create a docker image
      1. Create a file named Dockerfile in the same working directory with the following contents: FROM centos:7
        RUN yum install epel-release -y
        RUN yum update -y
        RUN yum install jq -y
        COPY ./node-taint-script.sh /node-taint-script.sh
        RUN chmod +x /node-taint-script.sh
        CMD ["/bin/bash", "-c", "/node-taint-script.sh && tail -f /dev/null"]
      2. Build the docker image using: docker build -t <your-registry-server>/<your-repository-name>:<your-tag> .
        Push the image to a container registry
      3. Login to your container registry. [Reference]
      4. Push the docker image to your container registry using: docker push <your-registry-server>/<your-repository-name>:<your-tag>
        Create a Namespace, Service Account, Clusterrole and Clusterrolebinding

    In the AKS cluster,

    1. Create a namespace like:
        kubectl create ns node-taint  
      
    2. Create a Service Account in the namespace like:
        kubectl create sa node-taint -n node-taint   
      
    3. Create a Clusterrole like:
        kubectl create clusterrole node-taint-clusterrole --resource=nodes --verb=get,list,patch  
      
    4. Create a Clusterrolebinding like:
        kubectl create clusterrolebinding node-taint-clusterrolebinding --clusterrole node-taint-clusterrole --serviceaccount node-taint:node-taint  
      
      Create Deployment

    Create the deployment on the AKS cluster like:

    cat <<EOF | kubectl apply -f -  
    apiVersion: apps/v1  
    kind: Deployment  
    metadata:  
      labels:  
        app: node-taint  
      name: node-taint  
    spec:  
      replicas: 1  
      selector:  
        matchLabels:  
          app: node-taint  
      template:  
        metadata:  
          labels:  
            app: node-taint  
        spec:  
          containers:  
          - image: <your-registry-server>/<your-repository-name>:<your-tag>  
            name: node-taint  
          serviceAccountName: node-taint  
          priorityClassName: system-node-critical  
    EOF  
    

    We have used the priorityClassName: system-node-critical so this is scheduled and starts its action before your application pods are scheduled. Reference

    When you create the deployment, you might already have a tainted node. The Deployment will again taint and label the last node available. You can manually remove the taint and/or label from the last node. This is a one-time action required since we are not adding conditions to check if a node is already tainted in the script (to keep things simple).


    Hope this helps.

    Please "Accept as Answer" if it helped, so that it can help others in the community looking for help on similar topics.

    1 person found this answer helpful.
    0 comments No comments

3 additional answers

Sort by: Most helpful
  1. Shreyas Arani 266 Reputation points
    2021-09-30T07:06:35.687+00:00

    Hi @SRIJIT-BOSE-MSFT thank you for your quick response will try to implement the above scripts and will let you know about the results.

    1 person found this answer helpful.
    0 comments No comments

  2. Shreyas Arani 266 Reputation points
    2021-09-30T11:05:53.847+00:00

    Hi @SRIJIT-BOSE-MSFT can you similarly provide the command for labelling the node that needs to be added in the script. Even labelling should happen on the runtime and it should label the last node[same as the tainted node]. Because taint and tolerations doesn't guarantee that the pod matching the toleration will be scheduled on the node. To schedule the pod on tainted we need to use node affinity. So for this purpose we also need to label node during the starting of the cluster. please can you provide the command?


  3. Shreyas Arani 266 Reputation points
    2021-09-30T13:17:47.457+00:00

    Hi @SRIJIT-BOSE-MSFT I can see that node has been labelled and tainted
    136640-image.png

    136666-image.png

    but the problem is that pod is in CrashloopBackOff state.
    136659-image.png

    Following is my deployment yaml file
    136725-image.png
    `

    And deployed the yaml using the command
    kubectl apply -f taint.yaml -n node-taint

    I checked the logs but couldn't infer anything from it.
    In my dockerfile I have used centos7 as the base image and my AKS cluster is ubuntu do you think this is causing the problem?

    Any other reason why it is in crashloopbackoff state?


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.