CPU Pressure and system node pool CPU limits >> 100%

FedericoZarelli-6386 60 Reputation points
2024-11-27T09:00:20.8566667+00:00

Hello Team,

Please give me a hand to troubleshoot a CPU Pressure issue I am having on this AKS Kluster:

Kubernetes version: 1.30.3

Nodepools:

  • system node pools
    • Autoscale: True ( enabled after CPU Pressure report )
    • Node size: Standard_E2ds_v4
    • Taints: CriticalAddonsOnly=true:NoSchedule ( enabled after CPU Pressure report )
    • Min nodes: 3 ( increased after CPU Pressure report )
  • user node pools
  • Autoscale: True
  • Node size: Standard_E4ds_v4

For the system node pools, I am using a VM which is smaller than the recommended due to availability in my region and I noticed that the CPU limits on these nodes are 400% and 200%.

Now:

  • What's the impact on having such high limits? Should I just scale horizontally until limits are within 100%?
  • These CPU pressure events seems to occur regularly every week on the same day - is there any weekly job been run by the system pool?

Thanks in advance!

Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,447 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Anonymous
    2024-11-27T18:18:28.5266667+00:00

    Hi FedericoZarelli-6386,
    Thank you for reaching out to us on the Microsoft Q&A forum.
    Having high CPU limits such as 400% or 200%, can overcommit node resources and cause performance issues under certain conditions.
    Below are some potential impacts:

    High CPU limits may result in workloads competing for CPU cycles, particularly when nodes are fully utilized. The Kubernetes CPU scheduler enforces limits, meaning pods that reach their CPU limits will be throttled, leading to degraded performance. Overcommitted CPU resources can waste node capacity during idle times but cause significant performance degradation during peak demand.
    Scaling horizontally by adding nodes can help distribute the load, provided the autoscaler responds effectively to CPU pressure. However, scaling alone may not resolve the issue unless resource requests and limits are configured properly.

    If high CPU workloads are running on system node pools, consider moving them to a dedicated user node pool. The recurring CPU pressure events in your AKS cluster might be related to a weekly scheduled job or system activity. While AKS itself does not include predefined weekly tasks specifically tied to system node pools.

    Use the command "kubectl get cronjobs -A" to identify any weekly CronJobs that might be running at the cluster or namespace level.

    Please find the below documents for more information:

    If the information is helpful, please consider by clicking the " Accept answer and Upvote " on the post.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.