Troubleshoot high memory consumption in disk-intensive applications

2025-04-30

Disk input and output operations are costly, and most operating systems implement caching strategies for reading and writing data to the filesystem. The Linux kernel usually uses strategies such as the page cache to improve overall performance. The primary goal of the page cache is to store data read from the filesystem in the cache, making it available in memory for future read operations.

This article helps you identity and avoid high memory consumption in disk-intensive applications due to Linux kernel behaviors on Kubernetes pods.

Prerequisites

A tool to connect to the Kubernetes cluster, such as the kubectl tool. To install kubectl using the Azure CLI, run the az aks install-cli command.

Symptoms

When a disk-intensive application running on a pod performs frequent filesystem operations, high memory consumption might occur.

The following table outlines common symptoms of high memory consumption:

Symptom	Description
The working set metric is too high.	This issue occurs when there's a significant difference between the working set metric reported by the Kubernetes Metrics API and the actual memory consumed by an application.
Out-of-memory (OOM) kill.	This issue indicates memory issues exist on your pod.
Increased memory usage after heavy disk activity.	After operations such as backups, large file reads/writes, or data imports, memory consumption rises.
Memory usage grows indefinitely.	The pod's memory consumption increases over time without reducing, like a memory leak, even if the application itself isn't leaking memory.

Troubleshooting checklist

Step 1: Inspect the pod working set

To inspect the working set of pods reported by the Kubernetes Metrics API, run the following kubectl top pods command:

$ kubectl top pods -A | grep -i "<DEPLOYMENT_NAME>"
NAME                            CPU(cores)   MEMORY(bytes)
my-deployment-fc94b7f98-m9z2l   1m           344Mi

For detailed steps about how to identify which pod is consuming excessive memory, see Troubleshoot memory saturation in AKS clusters.

Step 2: Inspect pod memory statistics

To inspect the memory statistics of the cgroups on the pod that's consuming excessive memory, follow these steps:

Note

Cgroups help enforce resource management for pods and containers, including CPU/memory requests and limits for containerized workloads.

Connect to the pod:
```
$ kubectl exec <POD_NAME> -it -- bash
```
Navigate to the cgroup statistics directory and list the memory-related files:
```
$ ls /sys/fs/cgroup | grep -e memory.stat -e memory.current
memory.current memory.stat
```
- memory.current: Total memory currently used by the cgroup and its descendants.
- memory.stat: This breaks down the cgroup's memory footprint into different types of memory, type-specific details, and other information about the state and past events of the memory management system.
All the values listed in those files are in bytes.
Get an overview of how memory consumption is distributed on the pod:
```
$ cat /sys/fs/cgroup/memory.current
10645012480
$ cat /sys/fs/cgroup/memory.stat
anon 5197824
inactive_anon 5152768
active_anon 8192
...
file 10256240640
active_file 32768
inactive_file 10256207872
...
slab 354682456
slab_reclaimable 354554400
slab_unreclaimable 128056
...
```
cAdvisor uses memory.current and inactive_file to compute the working set metric. You can replicate the calculation using the following formula:

working_set = (memory.current - inactive_file) / 1048576 = (10645012480 - 10256207872) / 1048576 = 370 MB

Step 3: Determine kernel and application memory consumption

The following table describes some memory segments:

Segment	Description
`anon`	The amount of memory used in anonymous mappings. Most languages use this segment to allocate memory.
`file`	The amount of memory used to cache filesystem data, including tmpfs and shared memory.
`slab`	The amount of memory used to store data structures in the Linux kernel.

Combined with Step 2, anon represents 5,197,824 bytes, which isn't close to the total amount reported by the working set metric. The slab memory segment used by the Linux kernel represents 354,682,456 bytes, which is almost all the memory reported by the working set metric on the pod.

Step 4: Drop the kernel cache on a debugger pod

Note

This step might lead to availability and performance issues. Avoid running it in a production environment.

Get the node running the pod:

$ kubectl get pod -A -o wide | grep "<POD_NAME>"
NAME                            READY   STATUS    RESTARTS   AGE   IP            NODE                                NOMINATED NODE   READINESS GATES
my-deployment-fc94b7f98-m9z2l   1/1     Running   0          37m   10.244.1.17   aks-agentpool-26052128-vmss000004   <none>           <none>

Create a debugger pod using the kubectl debug command and create a kubectl session:

$ kubectl debug node/<NODE_NAME> -it --image=mcr.microsoft.com/cbl-mariner/busybox:2.0
$ chroot /host

Drop the kernel cache:
```
echo 1 > /proc/sys/vm/drop_caches
```

Verify if the command in the previous step causes any effect by repeating Step 1 and Step 2:

$ kubectl top pods -A | grep -i "<DEPLOYMENT_NAME>"
NAME                            CPU(cores)   MEMORY(bytes)
my-deployment-fc94b7f98-m9z2l   1m           4Mi

$ kubectl exec <POD_NAME> -it -- cat /sys/fs/cgroup/memory.stat
anon 4632576
file 1781760
...
slab_reclaimable 219312
slab_unreclaimable 173456
slab 392768

If you observe a significant decrease in both the working set and the slab memory segment, you're experiencing an issue with the Linux kernel using a great amount of memory on the pod.

Workaround: Configure appropriate memory limits and requests

The only effective workaround for high memory consumption on Kubernetes pods is to set realistic resource limits and requests. For example:

resources:
    requests:
        memory: 30Mi
    limits:
        memory: 60Mi

By configuring appropriate memory limits and requests in Kubernetes or the specification, you can ensure that Kubernetes manages memory allocation more efficiently, mitigating the impact of excessive kernel-level caching on pod memory usage.

Caution

Misconfigured pod memory limits can lead to problems such as OOMKilled errors.

References

Third-party information disclaimer

The third-party products that this article discusses are manufactured by companies that are independent of Microsoft. Microsoft makes no warranty, implied or otherwise, about the performance or reliability of these products.

Third-party contact disclaimer

Microsoft provides third-party contact information to help you find additional information about this topic. This contact information may change without notice. Microsoft does not guarantee the accuracy of third-party contact information.

Contact us for help

If you have questions or need help, create a support request, or ask Azure community support. You can also submit product feedback to Azure feedback community.

Share via