Increased memory usage reported in Kubernetes 1.25 or later versions

2025-03-03

This article discusses how to resolve an increased reported memory usage issue in Microsoft Azure Kubernetes 1.25 or a later version.

Symptoms

You experience one or more of the following symptoms:

Pods report increased memory usage after you upgrade a Microsoft Azure Kubernetes Service (AKS) cluster to Kubernetes 1.25 or a later version.
A node reports memory usage that's greater than in earlier versions of Kubernetes when you run the kubectl top node command.
Increased pod evictions and memory pressure occur within a node.

Cause

This increase is caused by a change in memory accounting within version 2 of the Linux control group (cgroup) API. Cgroup v2 is now the default cgroup version for Kubernetes 1.25 on AKS.

Note

This issue is distinct from the memory saturation in nodes that's caused by applications or frameworks that aren't aware of cgroup v2. For more information, see Memory saturation occurs in pods after cluster upgrade to Kubernetes 1.25.

Solution

If you observe frequent memory pressure on the nodes, upgrade your subscription to increase the amount of memory that's available to your virtual machines (VMs).
If you see a higher eviction rate on the pods, use higher limits and requests for pods.
cgroup v2 uses a different API than cgroup v1. If there are any applications that directly access the cgroup file system, update them to later versions that support cgroup v2. For example:
- Third-party monitoring and security agents:
  
  Some monitoring and security agents depend on the cgroup file system. Update these agents to versions that support cgroup v2.
- Java applications:
  
  Use versions that fully support cgroup v2:
  - OpenJDK/HotSpot: jdk8u372, 11.0.16, 15, and later versions.
  - IBM Semeru Runtimes: 8.0.382.0, 11.0.20.0, 17.0.8.0, and later versions.
  - IBM Java: 8.0.8.6 and later versions.
- uber-go/automaxprocs:
  If you're using the uber-go/automaxprocs package, ensure the version is v1.5.1 or later.
An alternative temporary solution is to revert the cgroup version on your nodes by using the DaemonSet. For more information, see Revert to cgroup v1 DaemonSet.

Important

Use the DaemonSet cautiously. Test it in a lower environment before applying to production to ensure compatibility and prevent disruptions.
By default, the DaemonSet applies to all nodes in the cluster and reboots them to implement the cgroup change.
To control how the DaemonSet is applied, configure a nodeSelector to target specific nodes.

Note

If you experience only an increase in memory use without any of the other symptoms that are mentioned in the "Symptoms" section, you don't have to take any action.

Status

We're actively working with the Kubernetes community to resolve the underlying issue. Progress on this effort can be tracked at Azure/AKS Issue #3443.

As part of the resolution, we plan to adjust the eviction thresholds or update resource reservations, depending on the outcome of the fix.

Reference

Node memory usage on cgroupv2 reported higher than cgroupv1 (GitHub issue)

Third-party information disclaimer

The third-party products that this article discusses are manufactured by companies that are independent of Microsoft. Microsoft makes no warranty, implied or otherwise, about the performance or reliability of these products.

Third-party contact disclaimer

Microsoft provides third-party contact information to help you find additional information about this topic. This contact information may change without notice. Microsoft does not guarantee the accuracy of third-party contact information.

Contact us for help

If you have questions or need help, create a support request, or ask Azure community support. You can also submit product feedback to Azure feedback community.

Share via