Ama-metrics-nodes pods failing due to OOMKilled error in my AKS cluster

Sunil Menon 25 Reputation points
2024-07-26T23:25:48.5866667+00:00

Hi,

I'm using an Azure managed Prometheus service and have been using the same in other two clusters, one production and one test. The configuration on all these clusters are the same except for the kubernetes version. the new cluster has version 1.27.9 and the old production cluster has 1.26

Also another difference is that the new cluster has KRAFT based kafka cluster and the old one has zookeeper based kafka cluster.

Im not able to understand the issue due to which the pods keep failing as the production cluster's ama-node-metrics pods are still stable with around 200 Debezium connectors and the new cluster just has 16 connectors yet it fails due to OOMKilled

Kindly help as I've seen only a few people getting this issue and no concrete solution to this problem.

Also please do let me know if you need any specific information.

Im using the default available ama-metrics-prometheus-config-node.yaml.:

kind: ConfigMap
apiVersion: v1
data:
  prometheus-config: |-
    scrape_configs:
    - job_name: 'kubernetes-pods'

      kubernetes_sd_configs:
      - role: pod

      relabel_configs:
      # Scrape only pods with the annotation: prometheus.io/scrape = true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true

      # If prometheus.io/path is specified, scrape this path instead of /metrics
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)

      # If prometheus.io/port is specified, scrape this port instead of the default
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__

      # If prometheus.io/scheme is specified, scrape with this scheme instead of http
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
        action: replace
        regex: (http|https)
        target_label: __scheme__

      # Include the pod namespace as a label for each metric (namespace)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace

      # Include the pod name as a label for each metric
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod

      # [Optional] Include all pod labels as labels for each metric
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)

metadata:
  name: ama-metrics-prometheus-config-node
  namespace: kube-system
Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,073 questions
{count} votes

Accepted answer
  1. Vishwanath 96 Reputation points Microsoft Employee
    2024-08-08T22:24:43.5666667+00:00

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.