An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
Hi Chad,
Thanks for the detailed post this is actually a really common ask and the documentation doesn't always connect the dots clearly, so happy to break it down for you.
The short answer is Yes, all of this is available (outside of preview) and most of it can be pushed to your own Prometheus/Loki/Grafana stack per subscription. Let me walk through each layer.
Level Wise explanation
Level 1 – Node metrics you already have this.
Basic CPU, Memory, and network metrics per node come out of the box. This is your current state.
Level 2 – Pod-level & network metrics
To go deeper than node-level, you'll want to enable Container Network Observability, part of Advanced Container Networking Services (ACNS). This gives you pod-level network metrics and supports both Cilium and non-Cilium clusters.
https://learn.microsoft.com/en-us/azure/aks/container-network-observability-how-to?tabs=cilium
Level 3 – Control plane (API server, etcd, scheduler, controller manager)
This is the gap you're hitting. There's good news and one important constraint we need to consider.
The good news: AKS now exposes control plane metrics (API server CPU/memory, etcd utilization, scheduler, etc.) through Azure Monitor Managed Prometheus and a subset of API server + etcd metrics are available free by default with no extra setup pls refer the link below.
https://learn.microsoft.com/en-us/azure/aks/control-plane-metrics-monitor
The constraint: You cannot scrape these directly with self-hosted Prometheus. The control plane runs multiple replicas behind a load balancer, so a self-hosted scraper would only hit one instance and give you incomplete data. Azure Managed Prometheus handles this correctly under the hood. The workaround for your BYO stack is to enable Managed Prometheus and configure a remote-write from the Azure Monitor Workspace into your own Prometheus — that way you keep your existing stack as the source of truth.
LOGS & K8S EVENTS : NO MORE MANUAL GRABBING
For container and pod logs, enable Container Insights. This collects from ContainerLogV2, KubePodInventory, and KubeEvents tables into a Log Analytics Workspace. You can then query these directly in your self-managed Grafana using the Azure Monitor data source plugin, so no need to maintain a separate view just for Azure.
https://learn.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-overview
For control plane logs (kube-apiserver, kube-audit, kube-controller-manager, kube-scheduler), you just need to enable Diagnostic Settings on the cluster and point them at your Log Analytics Workspace. It's a few clicks in the portal:
AKS Cluster → Monitoring → Diagnostic Settings → enable the categories you need.
https://learn.microsoft.com/en-us/azure/azure-monitor/reference/tables/akscontrolplane
For network flow logs (Cilium only), the Container Network Logs feature in ACNS writes flow logs to a RetinaNetworkFlowLogs table in Log Analytics.
https://learn.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-network-monitoring
Thanks,
Manish.