Memory & CPU Utilization drastically different for AKS

Question

Memory & CPU Utilization drastically different for AKS

Abdul Aziz 0

I am planning to use Descheduler in my AKS deployment to balance memory consumption of AKS nodes. My current output of kubectl top nodes is:

NAME                                CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   

aks-nodepool1-53884836-vmss000000   198m         10%    12317Mi         97%       
aks-nodepool1-53884836-vmss000001   189m         9%     12952Mi         102%      
aks-nodepool1-53884836-vmss000002   213m         11%    12747Mi         101%      
aks-nodepool1-53884836-vmss000003   135m         7%     5970Mi          47%

However when i tried different scenarios in Descheduler i got following output

I0612 13:51:55.145678       1 nodeutilization.go:204] "Node is underutilized" node="aks-nodepool1-53884836-vmss000003" usage={"cpu":"810m","memory":"476Mi","pods":"27"} usagePercentage={"cpu":42.63,"memory":3.78,"pods":10.8}
I0612 13:51:55.145712       1 nodeutilization.go:204] "Node is underutilized" node="aks-nodepool1-53884836-vmss000000" usage={"cpu":"582m","memory":"501Mi","pods":"54"} usagePercentage={"cpu":30.63,"memory":3.98,"pods":21.6}
I0612 13:51:55.145725       1 nodeutilization.go:204] "Node is underutilized" node="aks-nodepool1-53884836-vmss000001" usage={"cpu":"950m","memory":"596Mi","pods":"61"} usagePercentage={"cpu":50,"memory":4.74,"pods":24.4}
I0612 13:51:55.145743       1 nodeutilization.go:210] "Node is appropriately utilized" node="aks-nodepool1-53884836-vmss000002" usage={"cpu":"962m","memory":"647Mi","pods":"56"} usagePercentage={"cpu":50.63,"memory":5.14,"pods":22.4}

As you can see that utilization seen by Descheduler is drastically different from what top is reporting. Especially memory which is not more than 5% utilized in any of the nodes whereas top is reporting 47% and above

When i describe the node i see that utilization of all the custom pods as 0.

 Namespace                   Name                                                               CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                                               ------------  ----------  ---------------  -------------  ---

  default                     alerts-667b7bc-88djq                                                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         32d
  default                     alerts-ag-5544b98c45-xjnss                                            0 (0%)        0 (0%)      0 (0%)           0 (0%)         32d
  default                     api-6db9645d8b-p6jqm                                                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         32d
  default                     authentication-766496cf6b-js77v                                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         32d
  default                     authentication-ag-585fdf767nsp6                                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         32d
  default                     authentication-validator-76b444457c-6x66x                             0 (0%)        0 (0%)      0 (0%)           0 (0%)         32d
  default                     authorization-checker-5789b576ff-wssl2                                0 (0%)        0 (0%)      0 (0%)           0 (0%)         32d
  default                     authorization-78f759f849-xmk2p                                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         32d
  default                     backups-agent-68f47f764c-vpmlh                                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         32d
  default                     backups-f7d6c765d-qsl8v                                               0 (0%)        0 (0%)      0 (0%)           0 (0%)         32d

....

Allocated resources:

  (Total limits may be over 100 percent, i.e., overcommitted.)

  Resource           Requests    Limits
  --------           --------    ------

  cpu                582m (30%)  4647m (244%)
  memory             501Mi (3%)  3657Mi (29%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)

Describe node allocated resources seem to conform with what descheduler is seeing. Given that utilization of all the custom pods is shown as 0%. Getting top pod for one of the pods e.g. kubectl top pods alerts-667b7bc-88djq

NAME                            CPU(cores)   MEMORY(bytes)   
alerts-667b7bc-88djq             2m           108Mi

PodMetrics seems to agree with this. kubectl describe PodMetrics alerts-667b7bc-88djq

API Version:  metrics.k8s.io/v1beta1
Containers:
  Name:  alerts
  Usage:
    Cpu:     1268872n
    Memory:  111272Ki
Kind:        PodMetrics

Any help understanding whats going on here. Why describe node is failing to register any resource utilization (and subsequently descheduler reporting the same) whereas top nodes is presenting a totally different picture ?

vipullag-MSFT 26,487 Reputation points Moderator

2024-06-14T04:46:01.76+00:00

Hello Abdul Aziz

Any update on the issue?

Just checking in to see if you got a chance to see previous response.

If the suggested response helped you resolve your issue, please 'Accept as answer', so that it can help others in the community looking for help on similar topics.

If your issue persist, I would recommend you to open a azure support case. Support team will be able to check and help on this.

1 answer

Your answer

vipullag-MSFT 26,487 Reputation points Moderator

2024-06-14T04:46:01.76+00:00

Hello Abdul Aziz

Any update on the issue?

Just checking in to see if you got a chance to see previous response.

If the suggested response helped you resolve your issue, please 'Accept as answer', so that it can help others in the community looking for help on similar topics.

If your issue persist, I would recommend you to open a azure support case. Support team will be able to check and help on this.

Answer 1

Hello Abdul Aziz,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

Problem

I understand that you are having discrepancies between the memory usage metrics reported by kubectl top nodes and the utilization metrics used by the Descheduler in their AKS (Azure Kubernetes Service) deployment.

Solution

To solve the issues after analyzing the provided output and information. There are two causes to these:

The discrepancy is likely due to differences in how the metrics are collected and reported, or because the Descheduler uses resource requests and limits, while kubectl top nodes reports actual usage.
The pods may not have resource requests and limits set, causing the Descheduler to underestimate their actual resource usage

For the common issue listed above, you can do the following:

STAGE 1

Ensure that the metrics server is correctly reporting resource usage.

  kubectl get apiservice v1beta1.metrics.k8s.io -o yaml
  kubectl logs <metrics-server-pod>

Compare the metrics reported by kubectl top nodes, kubectl top pods, and Descheduler to identify any inconsistencies.

 kubectl top nodes
  kubectl top pods
  kubectl describe nodes

STAGE 2

Ensure that all pods have appropriate resource requests and limits set. This will help the Descheduler to make accurate decisions.

  apiVersion: v1
  kind: Pod
  metadata:
    name: example-pod
  spec:
    containers:
    - name: example-container
      image: nginx
      resources:
        requests:
          memory: "128Mi"
          cpu: "500m"
        limits:
          memory: "256Mi"
          cpu: "1000m"

Then, you will apply the changes to the cluster:

  kubectl apply -f example-pod.yaml

Remember: example-pod.yaml was the YAML file above.

Finally

After setting the resource requests and limits, you will then need to monitor the metrics to ensure they align.

   kubectl top nodes
   kubectl top pods --all-namespaces
   kubectl describe nodes

So therefore, if you can ensure that the metrics server is functioning correctly, setting appropriate resource requests and limits, and consistently monitoring metrics, you will be able to align the reported resource usage across kubectl top nodes, Descheduler, and kubectl describe node. This alignment is crucial.

References

For more detail instruction and source for the above solutions, kindly use the following links:

Source: Resource Requests and Limits. Accessed, 6/13/2024.

Source: Kubernetes Metrics Server. Accessed, 6/13/2024.

Source: Azure Kubernetes Service (AKS) Documentation. Accessed, 6/13/2024.

Source: Kubernetes Descheduler. Accessed, 6/13/2024.

Source: Setting Resource Requests and Limits. Accessed, 6/13/2024.

Source: Monitoring and Troubleshooting Metrics Server. Accessed, 6/13/2024.

Accept Answer

I hope this is helpful! Do not hesitate to let me know if you have any other questions.

** Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful ** so that others in the community facing similar issues can easily find the solution.

Best Regards,

Sina Salam

Share via

Memory & CPU Utilization drastically different for AKS

1 answer

Problem

Solution

STAGE 1

STAGE 2

Finally

References

Accept Answer

Your answer