Share via

AKS Metric and Log Collection

Chad 40 Reputation points
2026-05-06T17:34:02.71+00:00

I need guidance and information on what exactly is available to us and if we can get this into our Grafana/Prometheus stack.

Using the first screenshot on this page: https://learn.microsoft.com/en-us/azure/aks/monitor-aks?tabs=cilium ; we need more visibility into Levels 1, 2 and 3.

We have basic K8 metrics in our Grafana stack (per subscription) with CPU, Memory, some network, etc. per node but we have nothing on the controller and api server.

We're not capturing any logs from k8, we have to manually grab k8 events.

What all is available to us now (not in preview) and can all of this be pushed to each subscriptions Prometheus/Loki/Grafana stack or is it ONLY available to Azure's managed items?

I know this question is vague, but documentation doesn't spell it out as clearly as I need for it to in order to present it to management.

Azure Kubernetes Service
Azure Kubernetes Service

An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.


Answer accepted by question author

Manish Deshpande 6,420 Reputation points Microsoft External Staff Moderator
2026-05-06T18:29:17.3366667+00:00

Hi Chad,

Thanks for the detailed post this is actually a really common ask and the documentation doesn't always connect the dots clearly, so happy to break it down for you.

The short answer is Yes, all of this is available (outside of preview) and most of it can be pushed to your own Prometheus/Loki/Grafana stack per subscription. Let me walk through each layer.

Level Wise explanation

Level 1 – Node metrics you already have this.

Basic CPU, Memory, and network metrics per node come out of the box. This is your current state.

Level 2 – Pod-level & network metrics
To go deeper than node-level, you'll want to enable Container Network Observability, part of Advanced Container Networking Services (ACNS). This gives you pod-level network metrics and supports both Cilium and non-Cilium clusters.
https://learn.microsoft.com/en-us/azure/aks/container-network-observability-how-to?tabs=cilium

Level 3 – Control plane (API server, etcd, scheduler, controller manager)
This is the gap you're hitting. There's good news and one important constraint we need to consider.
The good news: AKS now exposes control plane metrics (API server CPU/memory, etcd utilization, scheduler, etc.) through Azure Monitor Managed Prometheus and a subset of API server + etcd metrics are available free by default with no extra setup pls refer the link below.

User's image

https://learn.microsoft.com/en-us/azure/aks/control-plane-metrics-monitor

The constraint: You cannot scrape these directly with self-hosted Prometheus. The control plane runs multiple replicas behind a load balancer, so a self-hosted scraper would only hit one instance and give you incomplete data. Azure Managed Prometheus handles this correctly under the hood. The workaround for your BYO stack is to enable Managed Prometheus and configure a remote-write from the Azure Monitor Workspace into your own Prometheus — that way you keep your existing stack as the source of truth.

LOGS & K8S EVENTS : NO MORE MANUAL GRABBING

For container and pod logs, enable Container Insights. This collects from ContainerLogV2, KubePodInventory, and KubeEvents tables into a Log Analytics Workspace. You can then query these directly in your self-managed Grafana using the Azure Monitor data source plugin, so no need to maintain a separate view just for Azure.

https://learn.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-overview

For control plane logs (kube-apiserver, kube-audit, kube-controller-manager, kube-scheduler), you just need to enable Diagnostic Settings on the cluster and point them at your Log Analytics Workspace. It's a few clicks in the portal:

AKS Cluster → Monitoring → Diagnostic Settings → enable the categories you need.
https://learn.microsoft.com/en-us/azure/azure-monitor/reference/tables/akscontrolplane

For network flow logs (Cilium only), the Container Network Logs feature in ACNS writes flow logs to a RetinaNetworkFlowLogs table in Log Analytics.
https://learn.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-network-monitoring

Thanks,
Manish.

Was this answer helpful?

1 person found this answer helpful.
0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Salamat Shah 340 Reputation points MVP
    2026-05-06T21:06:27.21+00:00

    Node and cluster level metrics are fully exportable; control-plane metrics are GA through Azure Monitor’s managed metrics, but deep system logs remain within Azure-managed monitoring unless you push them out via Azure Monitor pipelines.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.