Edit

Share via


Memory saturation occurs in pods after cluster upgrade to Kubernetes 1.25

This article discusses how to fix pods that stop working because of memory saturation or out-of-memory (OOM) errors that occur after you upgrade a Microsoft Azure Kubernetes Service (AKS) cluster to Kubernetes 1.25.x.

Symptoms

One or more of the following issues occur:

  • Memory pressure on nodes

  • Increased memory usage for apps when compared to their memory usage before the upgrade

  • CPU throttling on nodes

  • Pod failure because of OOM errors

Performance degradation can occur in apps that run in the following environments:

Note

This list of environments in which performance degradation can occur isn't a comprehensive list. There might be other environments that experience memory saturation or OOM issues.

Solution

Note

If you only experience increased memory usage and no other symptoms that are mentioned in the Symptoms section, no action is needed.

Beginning in the release of Kubernetes 1.25, the cgroup version 2 API has reached general availability (GA). AKS now uses Ubuntu Linux version 22.04. By default, version 22.04 uses cgroup version 2 API. To make sure the cgroup version 2 API is available for use in other environments to prevent the memory saturation issue, follow this guidance:

  • If you run Java applications, upgrade to a Java version that supports cgroup version 2 and follow the guidance in Containerize your Java applications. You might be able to update the base image in certain versions in which the fix has been backported. Use a version or framework that natively supports cgroup version 2. For Azure customers, Microsoft officially supports Eclipse Temurin binaries (Java 8) and Microsoft Build of OpenJDK binaries (Java 11+).

  • Similarly, if you're using .NET, upgrade to .NET version 5.0 or a later version.

  • If you see a higher eviction rate on the pods, use higher limits and requests for the pods.

  • cgroup v2 uses a different API than cgroup v1. If there are any applications that directly access the cgroup file system, update them to later versions that support cgroup v2. For example:

    • Third-party monitoring and security agents:

      Some monitoring and security agents depend on the cgroup file system. Update these agents to versions that support cgroup v2.

    • Java applications:

      Use versions that fully support cgroup v2:

      • OpenJDK/HotSpot: jdk8u372, 11.0.16, 15, and later versions.
      • IBM Semeru Runtimes: 8.0.382.0, 11.0.20.0, 17.0.8.0, and later versions.
      • IBM Java: 8.0.8.6 and later versions.
    • uber-go/automaxprocs:
      If you're using the uber-go/automaxprocs package, ensure the version is v1.5.1 or later.

  • An alternative temporary solution is to revert the cgroup version on your nodes by using the DaemonSet. For more information, see Revert to cgroup v1 DaemonSet.

    Important

    • Use the DaemonSet cautiously. Test it in a lower environment before applying to production to ensure compatibility and avoid disruptions.
    • By default, the DaemonSet applies to all nodes in the cluster and reboots them to implement the cgroup change.
    • To control how the DaemonSet is applied, configure a nodeSelector to target specific nodes.

Status

Microsoft is working with the Kubernetes community to resolve the issue. Track progress at Azure/AKS Issue #3443.

As part of the resolution, the plan is to adjust the eviction thresholds or update resource reservations, depending on the outcome of the fix.

Reference

Third-party information disclaimer

The third-party products that this article discusses are manufactured by companies that are independent of Microsoft. Microsoft makes no warranty, implied or otherwise, about the performance or reliability of these products.

Third-party contact disclaimer

Microsoft provides third-party contact information to help you find additional information about this topic. This contact information may change without notice. Microsoft does not guarantee the accuracy of third-party contact information.

Contact us for help

If you have questions or need help, create a support request, or ask Azure community support. You can also submit product feedback to Azure feedback community.