Nodepool in AKS not scaling down to 0 nodes?

Florian Maas 5 Reputation points
2023-05-23T07:35:01.4133333+00:00

Hi,

I have two nodepools in my AKS cluster; the default nodepool and an 'application' nodepool. I use the default nodepool for services like Airflow, and the application nodepool to run ETL jobs. However, the application nodepool never scales to zero, even when I do not schedule any ETL jobs for many hours.

I fail to understand why. Is there anyone who has any suggestions for the rootcause of the issue?

Below are some relevant details about the AKS cluster:

k top nodes

NAME                                  CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
aks-application-XXXXXXXX-vmss000000   55m          1%     1579Mi          12%       
aks-default-XXXXXXXX-vmss000000       677m         17%    7783Mi          61%  
az aks nodepool show \
--resource-group <my-rg> \
--cluster-name <my-cluster> \
--name application \
--query "{min: minCount, max: maxCount}"

{
  "max": 2,
  "min": 0
}
az aks show \
--resource-group <my-rg> \
--name <my-cluster> \
--query autoScalerProfile

{
  "balanceSimilarNodeGroups": "false",
  "expander": "random",
  "maxEmptyBulkDelete": "10",
  "maxGracefulTerminationSec": "180",
  "maxNodeProvisionTime": "15m",
  "maxTotalUnreadyPercentage": "45",
  "newPodScaleUpDelay": "0s",
  "okTotalUnreadyCount": "3",
  "scaleDownDelayAfterAdd": "3m",
  "scaleDownDelayAfterDelete": "10s",
  "scaleDownDelayAfterFailure": "3m",
  "scaleDownUnneededTime": "3m",
  "scaleDownUnreadyTime": "20m",
  "scaleDownUtilizationThreshold": "0.5",
  "scanInterval": "10s",
  "skipNodesWithLocalStorage": "true",
  "skipNodesWithSystemPods": "false"
}
k get pods  --sort-by="{.spec.nodeName}" -A -o wide                                                                                                          
NAMESPACE      NAME                                  READY   STATUS    RESTARTS   AGE     NODE                                 
kube-system    azure-ip-masq-agent-XXXXX             1/1     Running   0          3d17h   aks-application-XXXXXXXX-vmss000000
kube-system    metrics-server-XXXXXXXXXX-XXXXX       2/2     Running   0          3d17h   aks-application-XXXXXXXX-vmss000000
kube-system    metrics-server-XXXXXXXXXX-XXXXX       2/2     Running   0          3d17h   aks-application-XXXXXXXX-vmss000000
kube-system    kube-proxy-XXXXX                      1/1     Running   0          3d17h   aks-application-XXXXXXXX-vmss000000
kube-system    csi-blob-node-XXXXX                   3/3     Running   0          3d17h   aks-application-XXXXXXXX-vmss000000
kube-system    csi-azurefile-node-XXXXX              3/3     Running   0          3d17h   aks-application-XXXXXXXX-vmss000000
kube-system    csi-azuredisk-node-XXXXX              3/3     Running   0          3d17h   aks-application-XXXXXXXX-vmss000000
kube-system    cloud-node-manager-XXXXX              1/1     Running   0          3d17h   aks-application-XXXXXXXX-vmss000000
kube-system    cloud-node-manager-XXXXX              1/1     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
airflow-prod   airflow-pgbouncer-XXXXXXXXXX-XXXXX    2/2     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
airflow-prod   airflow-triggerer-XXXXXXXXX-XXXXX     1/1     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
airflow-prod   airflow-webserver-XXXXXXXXX-XXXXX     1/1     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
airflow-prod   airflow-scheduler-XXXXXXXXX-XXXXX     2/2     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
kube-system    azure-ip-masq-agent-XXXXX             1/1     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
airflow-prod   airflow-postgresql-0                  1/1     Running   0          3d17h   aks-default-XXXXXXXX-vmss000000
kube-system    coredns-XXXXXXXXXX-XXXXX              1/1     Running   0          3d17h   aks-default-XXXXXXXX-vmss000000
kube-system    coredns-XXXXXXXXXX-XXXXX              1/1     Running   0          3d17h   aks-default-XXXXXXXX-vmss000000
kube-system    coredns-autoscaler-XXXXXXXXXX-XXXXX   1/1     Running   0          3d17h   aks-default-XXXXXXXX-vmss000000
airflow-prod   airflow-statsd-XXXXXXXX-XXXXX         1/1     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
kube-system    csi-azuredisk-node-XXXXX              3/3     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
kube-system    csi-azurefile-node-XXXXX              3/3     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
kube-system    csi-blob-node-XXXXX                   3/3     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
kube-system    konnectivity-agent-XXXXXXXXXX-XXXXX   1/1     Running   0          3d17h   aks-default-XXXXXXXX-vmss000000
kube-system    konnectivity-agent-XXXXXXXXXX-XXXXX   1/1     Running   0          3d17h   aks-default-XXXXXXXX-vmss000000
kube-system    kube-proxy-XXXXX                      1/1     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,993 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Ammar-Abdelqader01 1,156 Reputation points Microsoft Employee
    2023-05-23T09:18:14.6633333+00:00

    Hello @Florian Maas

    Thank you for your question, as I see from your shared information the system pods are distributed between two pool nodes the default and applications, system pods inside the Kube-system namespace.

    1- the system node-pool can't scale down to 0 at least should be 1 node.

    User's image

    2- to make sure the system pods are not distributed between node-pools you have to make sure that you have a taint on the system node-pool to make sure that the system pods are only scheduled on the system node-pool please check this document

    User's image

    3- to add a dedicated system node-pool use this command from this document at least one node.
    4- now you can scale the nodes to 0 on the user node-pool that schedules application pods using this document

    User's image

    if this has been helpful, please take a moment to accept answers as this helps increase the visibility of this question for other members of the Microsoft Q&A community. Thank you for helping to improve Microsoft Q&A!

    2 people found this answer helpful.

  2. Florian Maas 5 Reputation points
    2023-05-25T13:14:17.7633333+00:00

    In the end, adding a dedicated system node pool did not resolve the issue. Initially the metrics-server pods were moved to the new node pool, but they ended up in the application node pool again later, after which the application nodepool would not scale down to 0 anymore. I believe that is because they had this volume attached:

    Volumes:
      tmp-dir:
        Type:       EmptyDir
    

    The autoscaler documentation for the argument skip_nodes_with_local_storage reads:

    (Optional) If true cluster autoscaler will never delete nodes with pods with local storage, for example, EmptyDir or HostPath. Defaults to true.

    So I added the following to my autoscaler configuration in Terraform:

      auto_scaler_profile {
        # (Optional) If true cluster autoscaler will never delete nodes with pods from kube-system (except for DaemonSet or mirror pods). Defaults to true.
        # metrics server is not Daemonset, so will not allow scale down.
        skip_nodes_with_system_pods = false
        # (Optional) If true cluster autoscaler will never delete nodes with pods with local storage, for example, EmptyDir or HostPath. Defaults to true.
        skip_nodes_with_local_storage = false
      }
    
    0 comments No comments