Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article discusses how to fix pods that stop working because of memory saturation or out-of-memory (OOM) errors that occur after you upgrade a Microsoft Azure Kubernetes Service (AKS) cluster to Kubernetes 1.25.x.
Symptoms
One or more of the following issues occur:
Memory pressure on nodes
Increased memory usage for apps when compared to their memory usage before the upgrade
CPU throttling on nodes
Pod failure because of OOM errors
Performance degradation can occur in apps that run in the following environments:
- Java Runtime Environment (JRE) (for JRE versions that are earlier than version 11.0.18 or version 1.8.0 372)
- .NET versions that are earlier than version 5.0
- Node.js
Note
This list of environments in which performance degradation can occur isn't a comprehensive list. There might be other environments that experience memory saturation or OOM issues.
Solution
Note
If you only experience increased memory usage and no other symptoms that are mentioned in the Symptoms section, no action is needed.
Beginning in the release of Kubernetes 1.25, the cgroup version 2 API has reached general availability (GA). AKS now uses Ubuntu Linux version 22.04. By default, version 22.04 uses cgroup version 2 API. To make sure the cgroup version 2 API is available for use in other environments to prevent the memory saturation issue, follow this guidance:
If you run Java applications, upgrade to a Java version that supports cgroup version 2 and follow the guidance in Containerize your Java applications. You might be able to update the base image in certain versions in which the fix has been backported. Use a version or framework that natively supports cgroup version 2. For Azure customers, Microsoft officially supports Eclipse Temurin binaries (Java 8) and Microsoft Build of OpenJDK binaries (Java 11+).
Similarly, if you're using .NET, upgrade to .NET version 5.0 or a later version.
If you see a higher eviction rate on the pods, use higher limits and requests for the pods.
cgroup
v2 uses a different API thancgroup
v1. If there are any applications that directly access thecgroup
file system, update them to later versions that supportcgroup
v2. For example:Third-party monitoring and security agents:
Some monitoring and security agents depend on the
cgroup
file system. Update these agents to versions that supportcgroup
v2.Java applications:
Use versions that fully support
cgroup
v2:- OpenJDK/HotSpot:
jdk8u372
,11.0.16
,15
, and later versions. - IBM Semeru Runtimes:
8.0.382.0
,11.0.20.0
,17.0.8.0
, and later versions. - IBM Java:
8.0.8.6
and later versions.
- OpenJDK/HotSpot:
uber-go/automaxprocs:
If you're using theuber-go/automaxprocs
package, ensure the version isv1.5.1
or later.
An alternative temporary solution is to revert the
cgroup
version on your nodes by using the DaemonSet. For more information, see Revert to cgroup v1 DaemonSet.Important
- Use the DaemonSet cautiously. Test it in a lower environment before applying to production to ensure compatibility and avoid disruptions.
- By default, the DaemonSet applies to all nodes in the cluster and reboots them to implement the
cgroup
change. - To control how the DaemonSet is applied, configure a
nodeSelector
to target specific nodes.
Status
Microsoft is working with the Kubernetes community to resolve the issue. Track progress at Azure/AKS Issue #3443.
As part of the resolution, the plan is to adjust the eviction thresholds or update resource reservations, depending on the outcome of the fix.
Reference
- Node memory usage on cgroupv2 reported higher than cgroupv1 (GitHub issue)
Third-party information disclaimer
The third-party products that this article discusses are manufactured by companies that are independent of Microsoft. Microsoft makes no warranty, implied or otherwise, about the performance or reliability of these products.
Third-party contact disclaimer
Microsoft provides third-party contact information to help you find additional information about this topic. This contact information may change without notice. Microsoft does not guarantee the accuracy of third-party contact information.
Contact us for help
If you have questions or need help, create a support request, or ask Azure community support. You can also submit product feedback to Azure feedback community.