Default Prometheus metrics configuration in Azure Monitor (preview)

This article lists the default targets, dashboards, and recording rules when you configure Prometheus metrics to be scraped from an AKS cluster for any AKS cluster.

Scrape frequency

The default scrape frequency for all default targets and scrapes is 30 seconds.

Targets scraped

  • cadvisor (job=cadvisor)
  • nodeexporter (job=node)
  • kubelet (job=kubelet)
  • kube-state-metrics (job=kube-state-metrics)

Metrics collected from default targets

The following metrics are collected by default from each default target. All other metrics are dropped through relabeling rules.

cadvisor (job=cadvisor)

  • container_memory_rss
  • container_network_receive_bytes_total
  • container_network_transmit_bytes_total
  • container_network_receive_packets_total
  • container_network_transmit_packets_total
  • container_network_receive_packets_dropped_total
  • container_network_transmit_packets_dropped_total
  • container_fs_reads_total
  • container_fs_writes_total
  • container_fs_reads_bytes_total
  • container_fs_writes_bytes_total|container_cpu_usage_seconds_total

kubelet (job=kubelet)

  • kubelet_node_name
  • kubelet_running_pods
  • kubelet_running_pod_count
  • kubelet_running_sum_containers
  • kubelet_running_container_count
  • volume_manager_total_volumes
  • kubelet_node_config_error
  • kubelet_runtime_operations_total
  • kubelet_runtime_operations_errors_total
  • kubelet_runtime_operations_duration_seconds_bucket
  • kubelet_runtime_operations_duration_seconds_sum
  • kubelet_runtime_operations_duration_seconds_count
  • kubelet_pod_start_duration_seconds_bucket
  • kubelet_pod_start_duration_seconds_sum
  • kubelet_pod_start_duration_seconds_count
  • kubelet_pod_worker_duration_seconds_bucket
  • kubelet_pod_worker_duration_seconds_sum
  • kubelet_pod_worker_duration_seconds_count
  • storage_operation_duration_seconds_bucket
  • storage_operation_duration_seconds_sum
  • storage_operation_duration_seconds_count
  • storage_operation_errors_total
  • kubelet_cgroup_manager_duration_seconds_bucket
  • kubelet_cgroup_manager_duration_seconds_sum
  • kubelet_cgroup_manager_duration_seconds_count
  • kubelet_pleg_relist_interval_seconds_bucket
  • kubelet_pleg_relist_interval_seconds_count
  • kubelet_pleg_relist_interval_seconds_sum
  • kubelet_pleg_relist_duration_seconds_bucket
  • kubelet_pleg_relist_duration_seconds_count
  • kubelet_pleg_relist_duration_seconds_sum
  • rest_client_requests_total
  • rest_client_request_duration_seconds_bucket
  • rest_client_request_duration_seconds_sum
  • rest_client_request_duration_seconds_count
  • process_resident_memory_bytes
  • process_cpu_seconds_total
  • go_goroutines
  • kubernetes_build_info

nodexporter (job=node)

  • node_memory_MemTotal_bytes
  • node_cpu_seconds_total
  • node_memory_MemAvailable_bytes
  • node_memory_Buffers_bytes
  • node_memory_Cached_bytes
  • node_memory_MemFree_bytes
  • node_memory_Slab_bytes
  • node_filesystem_avail_bytes
  • node_filesystem_size_bytes
  • node_time_seconds
  • node_exporter_build_info
  • node_load1
  • node_vmstat_pgmajfault
  • node_network_receive_bytes_total
  • node_network_transmit_bytes_total
  • node_network_receive_drop_total
  • node_network_transmit_drop_total
  • node_disk_io_time_seconds_total
  • node_disk_io_time_weighted_seconds_total
  • node_load5
  • node_load15
  • node_disk_read_bytes_total
  • node_disk_written_bytes_total
  • node_uname_info

kube-state-metrics (job=kube-state-metrics)

  • kube_node_status_allocatable
  • kube_pod_owner
  • kube_pod_container_resource_requests
  • kube_pod_status_phase
  • kube_pod_container_resource_limits
  • kube_pod_info|kube_replicaset_owner
  • kube_resourcequota
  • kube_namespace_status_phase
  • kube_node_status_capacity
  • kube_node_info
  • kube_pod_info
  • kube_deployment_spec_replicas
  • kube_deployment_status_replicas_available
  • kube_deployment_status_replicas_updated
  • kube_statefulset_status_replicas_ready
  • kube_statefulset_status_replicas
  • kube_statefulset_status_replicas_updated
  • kube_job_status_start_time
  • kube_job_status_active
  • kube_job_failed
  • kube_horizontalpodautoscaler_status_desired_replicas
  • kube_horizontalpodautoscaler_status_current_replicas
  • kube_horizontalpodautoscaler_spec_min_replicas
  • kube_horizontalpodautoscaler_spec_max_replicas
  • kubernetes_build_info
  • kube_node_status_condition
  • kube_node_spec_taint

Dashboards

Following are the default dashboards that are automatically provisioned and configured by Azure Monitor managed service for Prometheus when you link your Azure Monitor workspace to an Azure Managed Grafana instance. Source code for these dashboards can be found in GitHub

  • Kubernetes / Compute Resources / Cluster
  • Kubernetes / Compute Resources / Namespace (Pods)
  • Kubernetes / Compute Resources / Node (Pods)
  • Kubernetes / Compute Resources / Pod
  • Kubernetes / Compute Resources / Namespace (Workloads)
  • Kubernetes / Compute Resources / Workload
  • Kubernetes / Kubelet
  • Node Exporter / USE Method / Node
  • Node Exporter / Nodes

Recording rules

Following are the default recording rules that are automatically configured by Azure Monitor managed service for Prometheus when you link your Azure Monitor workspace to an Azure Managed Grafana instance. Source code for these recording rules can be found in GitHub

  • cluster:node_cpu:ratio_rate5m
  • namespace_cpu:kube_pod_container_resource_requests:sum
  • namespace_cpu:kube_pod_container_resource_limits:sum
  • :node_memory_MemAvailable_bytes:sum
  • namespace_memory:kube_pod_container_resource_requests:sum
  • namespace_memory:kube_pod_container_resource_limits:sum
  • namespace_workload_pod:kube_pod_owner:relabel
  • node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
  • cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests
  • cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits
  • cluster:namespace:pod_memory:active:kube_pod_container_resource_requests
  • cluster:namespace:pod_memory:active:kube_pod_container_resource_limits
  • node_namespace_pod_container:container_memory_working_set_bytes
  • node_namespace_pod_container:container_memory_rss
  • node_namespace_pod_container:container_memory_cache
  • node_namespace_pod_container:container_memory_swap
  • instance:node_cpu_utilisation:rate5m
  • instance:node_load1_per_cpu:ratio
  • instance:node_memory_utilisation:ratio
  • instance:node_vmstat_pgmajfault:rate5m
  • instance:node_network_receive_bytes_excluding_lo:rate5m
  • instance:node_network_transmit_bytes_excluding_lo:rate5m
  • instance:node_network_receive_drop_excluding_lo:rate5m
  • instance:node_network_transmit_drop_excluding_lo:rate5m
  • instance_device:node_disk_io_time_seconds:rate5m
  • instance_device:node_disk_io_time_weighted_seconds:rate5m
  • instance:node_num_cpu:sum

Next steps