How to determine what exactly use so much resources behind "Other Processes" entry listed in Node Insights in AKS Monitoring

Dmytro Pavlenko 0 Reputation points
2023-04-19T10:28:27.4966667+00:00

So I have a AKS cluster and run into a problem when almost all my nodes using 100%+ of allocatable memory. When I run kubectl top node command it shows values below the "Memory%" column that are greater than 100%. I went to my Prometheus+Grafana monitoring stack deployed manually to see what causing such consumption and found out that all my pods using less then 40% of available memory... I then decided to go to Azure and try enable Monitoring (Container Insights) for my cluster. After few minutes I got results which surprise me a bit. When I went to the "Insights" tab and selected "Nodes" then I picked "Memory working set (Computed from allocatable)" metric and saw the results. MicrosoftTeams-image

On the screenshot above you can see "Other Processes" entry which is consuming more than 50% of available memory. And I see that trend on all of my nodes. Looking at the documentation https://learn.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-analyze#analyze-nodes-controllers-and-container-health I found the following explanation Screenshot 2023-04-19 at 13.20.17

As I don't have any non-Kubernetes workloads on all of my nodes, I can't understand how to determine what exactly causing such memory consumption. Does anyone ever met such case before or deal with it? I'm looking for an advice, thank you.

Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,459 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Andrei Barbu 2,596 Reputation points Microsoft Employee
    2023-04-19T10:40:11.3966667+00:00

    Hello Dmytro Pavlenko I would recommend you to get inside the node(s) as per https://learn.microsoft.com/en-us/azure/aks/node-access and run the below command. It will sort the processes based on the amount of memory they consume.

    ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head
    

    Just as an idea to isolate the issue, you may want to create a fresh AKS cluster with a similar configuration and run the same command to compare what they have in command in order to understand what runs by default and what represents your workload. I hope you will find this helpful. Please "Accept as an answer" and "Upvote" if it helped you. Thank you!


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.