Share via


Azure 監視器中的預設 Prometheus 計量設定

本文會列出當您針對任何 AKS 叢集將 Prometheus 計量設定為從 Azure Kubernetes Service (AKS) 叢集抓取時的預設目標、儀表板與錄製規則。

最小內嵌設定檔

Minimal ingestion profile 是有助於降低計量擷取量的設定,因為只會收集預設儀表板、預設錄製規則與預設警示所使用的計量。 針對以附加元件為基礎的集合,預設會啟用 Minimal ingestion profile 設定。 您可以修改集合以便收集更多計量,如下所示。

擷取頻率

所有預設目標與抓取的預設抓取頻率為 30 秒。

預設抓取的目標

默認會啟用/開啟下列目標-這表示您不需要提供任何擷取這些目標的報廢作業設定,因為計量附加元件預設會自動刮取這些目標。

  • cadvisorjob=cadvisor
  • nodeexporterjob=node
  • kubeletjob=kubelet
  • kube-state-metricsjob=kube-state-metrics
  • controlplane-apiserverjob=controlplane-apiserver
  • controlplane-etcdjob=controlplane-etcd

從預設目標收集的計量

根據預設,會從每個預設目標收集下列計量。 所有其他計量都會透過重新標記規則來卸除。

cadvisor (job=cadvisor)

  • container_spec_cpu_period
  • container_spec_cpu_quota
  • container_cpu_usage_seconds_total
  • container_memory_rss
  • container_network_receive_bytes_total
  • container_network_transmit_bytes_total
  • container_network_receive_packets_total
  • container_network_transmit_packets_total
  • container_network_receive_packets_dropped_total
  • container_network_transmit_packets_dropped_total
  • container_fs_reads_total
  • container_fs_writes_total
  • container_fs_reads_bytes_total
  • container_fs_writes_bytes_total
  • container_memory_working_set_bytes
  • container_memory_cache
  • container_memory_swap
  • container_cpu_cfs_throttled_periods_total
  • container_cpu_cfs_periods_total
  • container_memory_usage_bytes
  • kubernetes_build_info"

kubelet (job=kubelet)

  • kubelet_volume_stats_used_bytes
  • kubelet_node_name
  • kubelet_running_pods
  • kubelet_running_pod_count
  • kubelet_running_containers
  • kubelet_running_container_count
  • volume_manager_total_volumes
  • kubelet_node_config_error
  • kubelet_runtime_operations_total
  • kubelet_runtime_operations_errors_total
  • kubelet_runtime_operations_duration_seconds kubelet_runtime_operations_duration_seconds_bucket kubelet_runtime_operations_duration_seconds_sum kubelet_runtime_operations_duration_seconds_count
  • kubelet_pod_start_duration_seconds kubelet_pod_start_duration_seconds_bucket kubelet_pod_start_duration_seconds_sum kubelet_pod_start_duration_seconds_count
  • kubelet_pod_worker_duration_seconds kubelet_pod_worker_duration_seconds_bucket kubelet_pod_worker_duration_seconds_sum kubelet_pod_worker_duration_seconds_count
  • storage_operation_duration_seconds storage_operation_duration_seconds_bucket storage_operation_duration_seconds_sum storage_operation_duration_seconds_count
  • storage_operation_errors_total
  • kubelet_cgroup_manager_duration_seconds kubelet_cgroup_manager_duration_seconds_bucket kubelet_cgroup_manager_duration_seconds_sum kubelet_cgroup_manager_duration_seconds_count
  • kubelet_pleg_relist_duration_seconds kubelet_pleg_relist_duration_seconds_bucket kubelet_pleg_relist_duration_sum kubelet_pleg_relist_duration_seconds_count
  • kubelet_pleg_relist_interval_seconds kubelet_pleg_relist_interval_seconds_bucket kubelet_pleg_relist_interval_seconds_sum kubelet_pleg_relist_interval_seconds_count
  • rest_client_requests_total
  • rest_client_request_duration_seconds rest_client_request_duration_seconds_bucket rest_client_request_duration_seconds_sum rest_client_request_duration_seconds_count
  • process_resident_memory_bytes
  • process_cpu_seconds_total
  • go_goroutines
  • kubelet_volume_stats_capacity_bytes
  • kubelet_volume_stats_available_bytes
  • kubelet_volume_stats_inodes_used
  • kubelet_volume_stats_inodes
  • kubernetes_build_info"

nodexporter (job=node)

  • node_cpu_seconds_total
  • node_memory_MemAvailable_bytes
  • node_memory_Buffers_bytes
  • node_memory_Cached_bytes
  • node_memory_MemFree_bytes
  • node_memory_Slab_bytes
  • node_memory_MemTotal_bytes
  • node_netstat_Tcp_RetransSegs
  • node_netstat_Tcp_OutSegs
  • node_netstat_TcpExt_TCPSynRetrans
  • node_load1``node_load5
  • node_load15
  • node_disk_read_bytes_total
  • node_disk_written_bytes_total
  • node_disk_io_time_seconds_total
  • node_filesystem_size_bytes
  • node_filesystem_avail_bytes
  • node_filesystem_readonly
  • node_network_receive_bytes_total
  • node_network_transmit_bytes_total
  • node_vmstat_pgmajfault
  • node_network_receive_drop_total
  • node_network_transmit_drop_total
  • node_disk_io_time_weighted_seconds_total
  • node_exporter_build_info
  • node_time_seconds
  • node_uname_info"

kube-state-metrics (job=kube-state-metrics)

  • kube_job_status_succeeded
  • kube_job_spec_completions
  • kube_daemonset_status_desired_number_scheduled
  • kube_daemonset_status_number_ready
  • kube_deployment_status_replicas_ready
  • kube_pod_container_status_last_terminated_reason
  • kube_pod_container_status_waiting_reason
  • kube_pod_container_status_restarts_total
  • kube_node_status_allocatable
  • kube_pod_owner
  • kube_pod_container_resource_requests
  • kube_pod_status_phase
  • kube_pod_container_resource_limits
  • kube_replicaset_owner
  • kube_resourcequota
  • kube_namespace_status_phase
  • kube_node_status_capacity
  • kube_node_info
  • kube_pod_info
  • kube_deployment_spec_replicas
  • kube_deployment_status_replicas_available
  • kube_deployment_status_replicas_updated
  • kube_statefulset_status_replicas_ready
  • kube_statefulset_status_replicas
  • kube_statefulset_status_replicas_updated
  • kube_job_status_start_time
  • kube_job_status_active
  • kube_job_failed
  • kube_horizontalpodautoscaler_status_desired_replicas
  • kube_horizontalpodautoscaler_status_current_replicas
  • kube_horizontalpodautoscaler_spec_min_replicas
  • kube_horizontalpodautoscaler_spec_max_replicas
  • kubernetes_build_info
  • kube_node_status_condition
  • kube_node_spec_taint
  • kube_pod_container_info
  • kube_resource_labels (ex - kube_pod_labels, kube_deployment_labels)
  • kube_resource_annotations (ex - kube_pod_annotations, kube_deployment_annotations)

controlplane-apiserver (job=controlplane-apiserver)

  • apiserver_request_total
  • apiserver_cache_list_fetched_objects_total
  • apiserver_cache_list_returned_objects_total
  • apiserver_flowcontrol_demand_seats_average
  • apiserver_flowcontrol_current_limit_seats
  • apiserver_request_sli_duration_seconds_bucket
  • apiserver_request_sli_duration_seconds_count
  • apiserver_request_sli_duration_seconds_sum
  • process_start_time_seconds
  • apiserver_request_duration_seconds_bucket
  • apiserver_request_duration_seconds_count
  • apiserver_request_duration_seconds_sum
  • apiserver_storage_list_fetched_objects_total
  • apiserver_storage_list_returned_objects_total
  • apiserver_current_inflight_requests

controlplane-etcd (job=controlplane-etcd)

  • etcd_server_has_leader
  • rest_client_requests_total
  • etcd_mvcc_db_total_size_in_bytes
  • etcd_mvcc_db_total_size_in_use_in_bytes
  • etcd_server_slow_read_indexes_total
  • etcd_server_slow_apply_total
  • etcd_network_client_grpc_sent_bytes_total
  • etcd_server_heartbeat_send_failures_total

針對 Windows 抓取的預設目標

下列 Windows 目標已設定為報廢,但預設不會啟用 (disabled/OFF) - 這表示您不需要提供任何報廢作業組態來報廢這些目標,但預設會停用/關閉這些目標,而且您必須使用區段下的 default-scrape-settings-enabled ama-metrics-settings-configmap 來開啟/啟用這些目標的報廢。

您可以針對 Windows 執行兩個預設作業,以抓取 Windows 特定儀表板所需的計量。

  • windows-exporterjob=windows-exporter
  • kube-proxy-windowsjob=kube-proxy-windows

注意

這需要套用或更新 ama-metrics-settings-configmap configmap,並在所有 Windows 節點上安裝 windows-exporter。 如需詳細資訊,請參閱啟用文件

針對 Windows 抓取的計量

啟用 windows-exporter 和 kube-proxy-windows 時,會收集下列計量。

windows-exporter (job=windows-exporter)

  • windows_system_system_up_time
  • windows_cpu_time_total
  • windows_memory_available_bytes
  • windows_os_visible_memory_bytes
  • windows_memory_cache_bytes
  • windows_memory_modified_page_list_bytes
  • windows_memory_standby_cache_core_bytes
  • windows_memory_standby_cache_normal_priority_bytes
  • windows_memory_standby_cache_reserve_bytes
  • windows_memory_swap_page_operations_total
  • windows_logical_disk_read_seconds_total
  • windows_logical_disk_write_seconds_total
  • windows_logical_disk_size_bytes
  • windows_logical_disk_free_bytes
  • windows_net_bytes_total
  • windows_net_packets_received_discarded_total
  • windows_net_packets_outbound_discarded_total
  • windows_container_available
  • windows_container_cpu_usage_seconds_total
  • windows_container_memory_usage_commit_bytes
  • windows_container_memory_usage_private_working_set_bytes
  • windows_container_network_receive_bytes_total
  • windows_container_network_transmit_bytes_total

kube-proxy-windows (job=kube-proxy-windows)

  • kubeproxy_sync_proxy_rules_duration_seconds
  • kubeproxy_sync_proxy_rules_duration_seconds_bucket
  • kubeproxy_sync_proxy_rules_duration_seconds_sum
  • kubeproxy_sync_proxy_rules_duration_seconds_count
  • rest_client_requests_total
  • rest_client_request_duration_seconds
  • rest_client_request_duration_seconds_bucket
  • rest_client_request_duration_seconds_sum
  • rest_client_request_duration_seconds_count
  • process_resident_memory_bytes
  • process_cpu_seconds_total
  • go_goroutines

儀表板​​

下列是當您將 Azure 監視器工作區連結至 Azure 受控 Grafana 執行個體時,適用於 Prometheus 的 Azure 監視器受控服務會自動佈建及設定的預設儀表板。 您可以在 這個 GitHub 存放庫 中找到這些儀表板的原始程式碼。 下列儀表板將會佈建在 Grafana 中 Managed Prometheus 資料夾下的指定 Azure Grafana 執行個體中。 這些是使用 Prometheus 和 Grafana 監視 Kubernetes 叢集的標準開放原始碼社群儀表板。

  • Kubernetes / Compute Resources / Cluster
  • Kubernetes / Compute Resources / Namespace (Pods)
  • Kubernetes / Compute Resources / Node (Pods)
  • Kubernetes / Compute Resources / Pod
  • Kubernetes / Compute Resources / Namespace (Workloads)
  • Kubernetes / Compute Resources / Workload
  • Kubernetes / Kubelet
  • Node Exporter / USE Method / Node
  • Node Exporter / Nodes
  • Kubernetes / Compute Resources / Cluster (Windows)
  • Kubernetes / Compute Resources / Namespace (Windows)
  • Kubernetes / Compute Resources / Pod (Windows)
  • Kubernetes / USE Method / Cluster (Windows)
  • Kubernetes / USE Method / Node (Windows)

錄製規則

當您 將 Prometheus 計量設定為從 Azure Kubernetes Service (AKS) 叢集擷取 Prometheus 計量時,Azure 監視器受控服務會自動設定下列預設錄製規則。 您可以在這個 GitHub 存放庫中找到這些錄製規則的原始程式碼。 這些是上述儀表板中使用的標準開放原始碼錄製規則。

  • cluster:node_cpu:ratio_rate5m
  • namespace_cpu:kube_pod_container_resource_requests:sum
  • namespace_cpu:kube_pod_container_resource_limits:sum
  • :node_memory_MemAvailable_bytes:sum
  • namespace_memory:kube_pod_container_resource_requests:sum
  • namespace_memory:kube_pod_container_resource_limits:sum
  • namespace_workload_pod:kube_pod_owner:relabel
  • node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
  • cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests
  • cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits
  • cluster:namespace:pod_memory:active:kube_pod_container_resource_requests
  • cluster:namespace:pod_memory:active:kube_pod_container_resource_limits
  • node_namespace_pod_container:container_memory_working_set_bytes
  • node_namespace_pod_container:container_memory_rss
  • node_namespace_pod_container:container_memory_cache
  • node_namespace_pod_container:container_memory_swap
  • instance:node_cpu_utilisation:rate5m
  • instance:node_load1_per_cpu:ratio
  • instance:node_memory_utilisation:ratio
  • instance:node_vmstat_pgmajfault:rate5m
  • instance:node_network_receive_bytes_excluding_lo:rate5m
  • instance:node_network_transmit_bytes_excluding_lo:rate5m
  • instance:node_network_receive_drop_excluding_lo:rate5m
  • instance:node_network_transmit_drop_excluding_lo:rate5m
  • instance_device:node_disk_io_time_seconds:rate5m
  • instance_device:node_disk_io_time_weighted_seconds:rate5m
  • instance:node_num_cpu:sum
  • node:windows_node:sum
  • node:windows_node_num_cpu:sum
  • :windows_node_cpu_utilisation:avg5m
  • node:windows_node_cpu_utilisation:avg5m
  • :windows_node_memory_utilisation:
  • :windows_node_memory_MemFreeCached_bytes:sum
  • node:windows_node_memory_totalCached_bytes:sum
  • :windows_node_memory_MemTotal_bytes:sum
  • node:windows_node_memory_bytes_available:sum
  • node:windows_node_memory_bytes_total:sum
  • node:windows_node_memory_utilisation:ratio
  • node:windows_node_memory_utilisation:
  • node:windows_node_memory_swap_io_pages:irate
  • :windows_node_disk_utilisation:avg_irate
  • node:windows_node_disk_utilisation:avg_irate
  • node:windows_node_filesystem_usage:
  • node:windows_node_filesystem_avail:
  • :windows_node_net_utilisation:sum_irate
  • node:windows_node_net_utilisation:sum_irate
  • :windows_node_net_saturation:sum_irate
  • node:windows_node_net_saturation:sum_irate
  • windows_pod_container_available
  • windows_container_total_runtime
  • windows_container_memory_usage
  • windows_container_private_working_set_usage
  • windows_container_network_received_bytes_total
  • windows_container_network_transmitted_bytes_total
  • kube_pod_windows_container_resource_memory_request
  • kube_pod_windows_container_resource_memory_limit
  • kube_pod_windows_container_resource_cpu_cores_request
  • kube_pod_windows_container_resource_cpu_cores_limit
  • namespace_pod_container:windows_container_cpu_usage_seconds_total:sum_rate

Prometheus 視覺效果錄製規則

當您使用 Prometheus 型 Container Insights 時,將會部署更多錄製規則以支援 Prometheus 視覺效果。

  • ux:cluster_pod_phase_count:sum
  • ux:node_cpu_usage:sum_irate
  • ux:node_memory_usage:sum
  • ux:controller_pod_phase_count:sum
  • ux:controller_container_count:sum
  • ux:controller_workingset_memory:sum
  • ux:controller_cpu_usage:sum_irate
  • ux:controller_rss_memory:sum
  • ux:controller_resource_limit:sum
  • ux:controller_container_restarts:max
  • ux:pod_container_count:sum
  • ux:pod_cpu_usage:sum_irate
  • ux:pod_workingset_memory:sum
  • ux:pod_rss_memory:sum
  • ux:pod_resource_limit:sum
  • ux:pod_container_restarts:max
  • ux:node_network_receive_drop_total:sum_irate
  • ux:node_network_transmit_drop_total:sum_irate

Windows 支援需要下列錄製規則。 這些規則會隨著上述規則一起部署,不過,預設不會啟用這些規則。 請遵循在 Azure 監視器工作區中啟用和停用規則群組的指示

  • ux:node_cpu_usage_windows:sum_irate
  • ux:node_memory_usage_windows:sum
  • ux:controller_cpu_usage_windows:sum_irate
  • ux:controller_workingset_memory_windows:sum
  • ux:pod_cpu_usage_windows:sum_irate
  • ux:pod_workingset_memory_windows:sum

下一步

自訂 Prometheus 計量抓取