當您從 Kubernetes 叢集在 Azure 監視器中啟用 Prometheus 計量收集時,它會針對目標、儀錶板和記錄規則使用預設設定。 本文說明預設設定,以及您可以選擇針對特定需求自訂設定的案例。
最小擷取設定檔
最小擷取設定檔是在 Azure 監視器中啟用叢集的 Prometheus 計量時預設啟用的設定。 此設定可將指標限制為僅由預設儀表板、預設記錄規則和預設警示使用的指標,以減少擷取的指標量。 本文列出了這些目標和指標。 如果停用此設定,則會收集預設目標的所有可用量度,這可能會大幅增加擷取量。
您可如使用 ConfigMap 在 Azure 監視器中自訂f Prometheus 計量的抓取所述修改計量設定 ConfigMap,以變更最小擷取設定檔設定。
自訂案例
您可以選擇使用預設組態,或根據您的特定需求自訂集合。 下表列出四種潛在的收集案例,以及達成每個案例的建議方法。
| Scenario | 方法 |
|---|---|
| 僅擷取每個預設目標的最小指標。 | 無需更改。 使用預設行為而不進行修改。 只會針對每個預設目標擷取本文中列出的量度。 |
| 除了最小量度之外,還加入幾個其他的量度,用於一個或多個預設目標。 | 讓最小擷取保持啟用狀態,並指定目標特有的適當保留清單。 請參閱 自訂由預設目標收集的指標。 |
| 僅匯入預設目標的特定量度集。 | 停用最低限度的數據擷取,並在自訂抓取任務中指定與目標相關的適當保留清單。 請參閱 從 Kubernetes 叢集使用 ConfigMap 建立自訂的 Prometheus 抓取工作。 |
| 匯入針對預設目標抓取的所有指標。 | 停用最小擷取,且不要為該目標指定任何保留清單。 請參閱 自定義預設目標收集的指標 |
預設抓取的目標
以下是 Azure 監視器計量附加元件預設可以抓取的目標,以及啟用它們的條件。 請參閱 啟用和停用預設目標, 以啟用/停用預設目標。
依預設會啟用下列目標。
cadvisornodeexporterkubeletkube-state-metricsnetworkobservabilityRetina
啟用 控制平面計量 (預覽) 時,會啟用下列目標。
controlplane-apiservercontrolplane-etcd
啟用 容器網路可觀測性 時,會啟用下列目標。
networkobservabilityHubblenetworkobservabilityCilium
啟用 Azure 容器儲存體 時,會啟用下列目標。
acstor-capacity-provisioneracstor-metrics-exporter
預設情況下,下列功能會被停用。
corednskubeproxyapiserver
下列目標預設為停用,且需要啟用 Windows 計量收集 (預覽版)。
windows-exporterkube-proxy-windows
從預設目標收集的計量
預設會從每個預設目標收集下列計量。 所有其他計量都會透過重新標記規則來卸除。 必須啟用目標,才能收集指標。
kubelet
kubelet_volume_stats_used_byteskubelet_node_namekubelet_running_podskubelet_running_pod_countkubelet_running_containerskubelet_running_container_countvolume_manager_total_volumeskubelet_node_config_errorkubelet_runtime_operations_totalkubelet_runtime_operations_errors_total-
kubelet_runtime_operations_duration_secondskubelet_runtime_operations_duration_seconds_bucketkubelet_runtime_operations_duration_seconds_sumkubelet_runtime_operations_duration_seconds_count -
kubelet_pod_start_duration_secondskubelet_pod_start_duration_seconds_bucketkubelet_pod_start_duration_seconds_sumkubelet_pod_start_duration_seconds_count -
kubelet_pod_worker_duration_secondskubelet_pod_worker_duration_seconds_bucketkubelet_pod_worker_duration_seconds_sumkubelet_pod_worker_duration_seconds_count -
storage_operation_duration_secondsstorage_operation_duration_seconds_bucketstorage_operation_duration_seconds_sumstorage_operation_duration_seconds_count storage_operation_errors_total-
kubelet_cgroup_manager_duration_secondskubelet_cgroup_manager_duration_seconds_bucketkubelet_cgroup_manager_duration_seconds_sumkubelet_cgroup_manager_duration_seconds_count -
kubelet_pleg_relist_duration_secondskubelet_pleg_relist_duration_seconds_bucketkubelet_pleg_relist_duration_sumkubelet_pleg_relist_duration_seconds_count -
kubelet_pleg_relist_interval_secondskubelet_pleg_relist_interval_seconds_bucketkubelet_pleg_relist_interval_seconds_sumkubelet_pleg_relist_interval_seconds_count rest_client_requests_total-
rest_client_request_duration_secondsrest_client_request_duration_seconds_bucketrest_client_request_duration_seconds_sumrest_client_request_duration_seconds_count process_resident_memory_bytesprocess_cpu_seconds_totalgo_goroutineskubelet_volume_stats_capacity_byteskubelet_volume_stats_available_byteskubelet_volume_stats_inodes_usedkubelet_volume_stats_inodeskubernetes_build_info"
coredns
coredns_build_infocoredns_panics_totalcoredns_dns_responses_totalcoredns_forward_responses_total-
coredns_dns_request_duration_secondscoredns_dns_request_duration_seconds_bucketcoredns_dns_request_duration_seconds_sumcoredns_dns_request_duration_seconds_count -
coredns_forward_request_duration_secondscoredns_forward_request_duration_seconds_bucketcoredns_forward_request_duration_seconds_sumcoredns_forward_request_duration_seconds_count coredns_dns_requests_totalcoredns_forward_requests_totalcoredns_cache_hits_totalcoredns_cache_misses_totalcoredns_cache_entriescoredns_plugin_enabled-
coredns_dns_request_size_bytescoredns_dns_request_size_bytes_bucketcoredns_dns_request_size_bytes_sumcoredns_dns_request_size_bytes_count -
coredns_dns_response_size_bytescoredns_dns_response_size_bytes_bucketcoredns_dns_response_size_bytes_sumcoredns_dns_response_size_bytes_count -
coredns_dns_response_size_bytescoredns_dns_response_size_bytes_bucketcoredns_dns_response_size_bytes_sumcoredns_dns_response_size_bytes_count process_resident_memory_bytesprocess_cpu_seconds_totalgo_goroutineskubernetes_build_info"
cadvisor
container_spec_cpu_periodcontainer_spec_cpu_quotacontainer_cpu_usage_seconds_totalcontainer_memory_rsscontainer_network_receive_bytes_totalcontainer_network_transmit_bytes_totalcontainer_network_receive_packets_totalcontainer_network_transmit_packets_totalcontainer_network_receive_packets_dropped_totalcontainer_network_transmit_packets_dropped_totalcontainer_fs_reads_totalcontainer_fs_writes_totalcontainer_fs_reads_bytes_totalcontainer_fs_writes_bytes_totalcontainer_memory_working_set_bytescontainer_memory_cachecontainer_memory_swapcontainer_cpu_cfs_throttled_periods_totalcontainer_cpu_cfs_periods_totalkubernetes_build_info"
kubeproxy
-
kubeproxy_sync_proxy_rules_duration_secondskubeproxy_sync_proxy_rules_duration_seconds_bucketkubeproxy_sync_proxy_rules_duration_seconds_sumkubeproxy_sync_proxy_rules_duration_seconds_countkubeproxy_network_programming_duration_seconds -
kubeproxy_network_programming_duration_secondskubeproxy_network_programming_duration_seconds_bucketkubeproxy_network_programming_duration_seconds_sumkubeproxy_network_programming_duration_seconds_countrest_client_requests_total -
rest_client_request_duration_secondsrest_client_request_duration_seconds_bucketrest_client_request_duration_seconds_sumrest_client_request_duration_seconds_count process_resident_memory_bytesprocess_cpu_seconds_totalgo_goroutineskubernetes_build_info"
apiserver
-
apiserver_request_duration_secondsapiserver_request_duration_seconds_bucketapiserver_request_duration_seconds_sumapiserver_request_duration_seconds_count apiserver_request_totalworkqueue_adds_total``workqueue_depth-
workqueue_queue_duration_secondsworkqueue_queue_duration_seconds_bucketworkqueue_queue_duration_seconds_sumworkqueue_queue_duration_seconds_count process_resident_memory_bytesprocess_cpu_seconds_totalgo_goroutineskubernetes_build_info"
kube-state
kube_job_status_succeededkube_job_spec_completionskube_daemonset_status_desired_number_scheduledkube_daemonset_status_number_readykube_deployment_status_replicas_readykube_pod_container_status_last_terminated_reasonkube_pod_container_status_waiting_reasonkube_pod_container_status_restarts_totalkube_node_status_allocatablekube_pod_ownerkube_pod_container_resource_requestskube_pod_status_phasekube_pod_container_resource_limitskube_replicaset_ownerkube_resourcequotakube_namespace_status_phasekube_node_status_capacitykube_node_infokube_pod_infokube_deployment_spec_replicaskube_deployment_status_replicas_availablekube_deployment_status_replicas_updatedkube_statefulset_status_replicas_readykube_statefulset_status_replicaskube_statefulset_status_replicas_updatedkube_job_status_start_timekube_job_status_activekube_job_failedkube_horizontalpodautoscaler_status_desired_replicaskube_horizontalpodautoscaler_status_current_replicaskube_horizontalpodautoscaler_spec_min_replicaskube_horizontalpodautoscaler_spec_max_replicaskubernetes_build_infokube_node_status_conditionkube_node_spec_taintkube_pod_container_info-
kube_resource_labels(例如 kube_pod_labels、kube_deployment_labels) -
kube_resource_annotations(例 - kube_pod_annotations、kube_deployment_annotations)
nodeexporter
node_cpu_seconds_totalnode_memory_MemAvailable_bytesnode_memory_Buffers_bytesnode_memory_Cached_bytesnode_memory_MemFree_bytesnode_memory_Slab_bytesnode_memory_MemTotal_bytesnode_netstat_Tcp_RetransSegsnode_netstat_Tcp_OutSegsnode_netstat_TcpExt_TCPSynRetransnode_load1``node_load5node_load15node_disk_read_bytes_totalnode_disk_written_bytes_totalnode_disk_io_time_seconds_totalnode_filesystem_size_bytesnode_filesystem_avail_bytesnode_filesystem_readonlynode_network_receive_bytes_totalnode_network_transmit_bytes_totalnode_vmstat_pgmajfaultnode_network_receive_drop_totalnode_network_transmit_drop_totalnode_disk_io_time_weighted_seconds_totalnode_exporter_build_infonode_time_secondsnode_uname_info"
windowsexporter
windows_system_system_up_timewindows_cpu_time_totalwindows_memory_available_byteswindows_os_visible_memory_byteswindows_memory_cache_byteswindows_memory_modified_page_list_byteswindows_memory_standby_cache_core_byteswindows_memory_standby_cache_normal_priority_byteswindows_memory_standby_cache_reserve_byteswindows_memory_swap_page_operations_totalwindows_logical_disk_read_seconds_totalwindows_logical_disk_write_seconds_totalwindows_logical_disk_size_byteswindows_logical_disk_free_byteswindows_net_bytes_totalwindows_net_packets_received_discarded_totalwindows_net_packets_outbound_discarded_totalwindows_container_availablewindows_container_cpu_usage_seconds_totalwindows_container_memory_usage_commit_byteswindows_container_memory_usage_private_working_set_byteswindows_container_network_receive_bytes_totalwindows_container_network_transmit_bytes_total
windowskubeproxy
kubeproxy_sync_proxy_rules_duration_secondskubeproxy_sync_proxy_rules_duration_seconds_bucketkubeproxy_sync_proxy_rules_duration_seconds_sumkubeproxy_sync_proxy_rules_duration_seconds_countrest_client_requests_totalrest_client_request_duration_secondsrest_client_request_duration_seconds_bucketrest_client_request_duration_seconds_sumrest_client_request_duration_seconds_countprocess_resident_memory_bytesprocess_cpu_seconds_totalgo_goroutines
網路可觀測性哈勃
- 請參閱容器 網路可觀測性指標。
網路觀察性Cilium
- 請參閱容器 網路可觀測性指標。
控制平面 API伺服器
apiserver_request_totalapiserver_cache_list_fetched_objects_totalapiserver_cache_list_returned_objects_totalapiserver_flowcontrol_demand_seats_averageapiserver_flowcontrol_current_limit_seatsapiserver_request_sli_duration_seconds_bucket{le=+inf}apiserver_request_sli_duration_seconds_countapiserver_request_sli_duration_seconds_sumprocess_start_time_secondsapiserver_request_duration_seconds_bucket{le=+inf}apiserver_request_duration_seconds_countapiserver_request_duration_seconds_sumapiserver_storage_list_fetched_objects_totalapiserver_storage_list_returned_objects_totalapiserver_current_inflight_requests
備註
apiserver_request_duration_seconds 和 apiserver_request_sli_duration_seconds 是具有高基數的直方圖指標,預設情況下不會收集所有系列。 只有總和和計數用於收集平均延遲。
controlplane-cluster-autoscaler
rest_client_requests_totalcluster_autoscaler_last_activitycluster_autoscaler_cluster_safe_to_autoscalecluster_autoscaler_scale_down_in_cooldowncluster_autoscaler_scaled_up_nodes_totalcluster_autoscaler_unneeded_nodes_countcluster_autoscaler_unschedulable_pods_countcluster_autoscaler_nodes_countcloudprovider_azure_api_request_errorscloudprovider_azure_api_request_duration_seconds_bucketcloudprovider_azure_api_request_duration_seconds_count
controlplane-node-auto-provisioning
karpenter_pods_statekarpenter_nodes_created_totalkarpenter_nodes_terminated_totalkarpenter_nodeclaims_disrupted_totalkarpenter_voluntary_disruption_eligible_nodeskarpenter_voluntary_disruption_decisions_total
controlplane-kube-scheduler
scheduler_pending_podsscheduler_unschedulable_podsscheduler_pod_scheduling_attemptsscheduler_queue_incoming_pods_totalscheduler_preemption_attempts_totalscheduler_preemption_victimsscheduler_scheduling_attempt_duration_secondsscheduler_schedule_attempts_totalscheduler_pod_scheduling_duration_seconds
controlplane-kube-controller-manager
rest_client_request_duration_secondsrest_client_requests_totalworkqueue_depth
controlplane-etcd
etcd_server_has_leaderrest_client_requests_totaletcd_mvcc_db_total_size_in_bytesetcd_mvcc_db_total_size_in_use_in_bytesetcd_server_slow_read_indexes_totaletcd_server_slow_apply_totaletcd_network_client_grpc_sent_bytes_totaletcd_server_heartbeat_send_failures_total
acstor-capacity-provisioner (job=acstor-capacity-provisioner)
- 請參閱 Azure 容器儲存體計量。
acstor-metrics-exporter (job=acstor-metrics-exporter)
- 請參閱 Azure 容器儲存體計量。
儀表板
下列是當您將 Azure 監視器工作區連結至 Azure 受控 Grafana 執行個體時,Azure 監視器適用於 Prometheus 的受管理服務會自動佈建及設定的預設儀表板。 她們被佈建在指定的 Azure Grafana 執行個體中的 Managed Prometheus 資料夾下。 這些是使用 Prometheus 和 Grafana 監視 Kubernetes 叢集的標準開放原始碼社群儀錶板。
Kubernetes / Compute Resources / ClusterKubernetes / Compute Resources / Namespace (Pods)Kubernetes / Compute Resources / Node (Pods)Kubernetes / Compute Resources / PodKubernetes / Compute Resources / Namespace (Workloads)Kubernetes / Compute Resources / WorkloadKubernetes / KubeletNode Exporter / USE Method / NodeNode Exporter / NodesKubernetes / Compute Resources / Cluster (Windows)Kubernetes / Compute Resources / Namespace (Windows)Kubernetes / Compute Resources / Pod (Windows)Kubernetes / USE Method / Cluster (Windows)Kubernetes / USE Method / Node (Windows)
錄製規則
當您將 Prometheus 的計量設定為從 Azure Kubernetes Service (AKS) 叢集擷取時,Azure 監視器受管理服務會自動配置預設錄製規則。 您可以在 此 GitHub 存放庫中找到這些錄製規則的原始碼。 這些是上述儀錶板中使用的標準開放原始碼錄製規則。
cluster:node_cpu:ratio_rate5mnamespace_cpu:kube_pod_container_resource_requests:sumnamespace_cpu:kube_pod_container_resource_limits:sum:node_memory_MemAvailable_bytes:sumnamespace_memory:kube_pod_container_resource_requests:sumnamespace_memory:kube_pod_container_resource_limits:sumnamespace_workload_pod:kube_pod_owner:relabelnode_namespace_pod_container:container_cpu_usage_seconds_total:sum_iratecluster:namespace:pod_cpu:active:kube_pod_container_resource_requestscluster:namespace:pod_cpu:active:kube_pod_container_resource_limitscluster:namespace:pod_memory:active:kube_pod_container_resource_requestscluster:namespace:pod_memory:active:kube_pod_container_resource_limitsnode_namespace_pod_container:container_memory_working_set_bytesnode_namespace_pod_container:container_memory_rssnode_namespace_pod_container:container_memory_cachenode_namespace_pod_container:container_memory_swapinstance:node_cpu_utilisation:rate5minstance:node_load1_per_cpu:ratioinstance:node_memory_utilisation:ratioinstance:node_vmstat_pgmajfault:rate5minstance:node_network_receive_bytes_excluding_lo:rate5minstance:node_network_transmit_bytes_excluding_lo:rate5minstance:node_network_receive_drop_excluding_lo:rate5minstance:node_network_transmit_drop_excluding_lo:rate5minstance_device:node_disk_io_time_seconds:rate5minstance_device:node_disk_io_time_weighted_seconds:rate5minstance:node_num_cpu:sumnode:windows_node:sumnode:windows_node_num_cpu:sum:windows_node_cpu_utilisation:avg5mnode:windows_node_cpu_utilisation:avg5m:windows_node_memory_utilisation::windows_node_memory_MemFreeCached_bytes:sumnode:windows_node_memory_totalCached_bytes:sum:windows_node_memory_MemTotal_bytes:sumnode:windows_node_memory_bytes_available:sumnode:windows_node_memory_bytes_total:sumnode:windows_node_memory_utilisation:rationode:windows_node_memory_utilisation:node:windows_node_memory_swap_io_pages:irate:windows_node_disk_utilisation:avg_iratenode:windows_node_disk_utilisation:avg_iratenode:windows_node_filesystem_usage:node:windows_node_filesystem_avail::windows_node_net_utilisation:sum_iratenode:windows_node_net_utilisation:sum_irate:windows_node_net_saturation:sum_iratenode:windows_node_net_saturation:sum_iratewindows_pod_container_availablewindows_container_total_runtimewindows_container_memory_usagewindows_container_private_working_set_usagewindows_container_network_received_bytes_totalwindows_container_network_transmitted_bytes_totalkube_pod_windows_container_resource_memory_requestkube_pod_windows_container_resource_memory_limitkube_pod_windows_container_resource_cpu_cores_requestkube_pod_windows_container_resource_cpu_cores_limitnamespace_pod_container:windows_container_cpu_usage_seconds_total:sum_rate
Prometheus 視覺效果錄製規則
下列錄製規則會自動部署,以支援 Prometheus 視覺化。
ux:cluster_pod_phase_count:sumux:node_cpu_usage:sum_irateux:node_memory_usage:sumux:controller_pod_phase_count:sumux:controller_container_count:sumux:controller_workingset_memory:sumux:controller_cpu_usage:sum_irateux:controller_rss_memory:sumux:controller_resource_limit:sumux:controller_container_restarts:maxux:pod_container_count:sumux:pod_cpu_usage:sum_irateux:pod_workingset_memory:sumux:pod_rss_memory:sumux:pod_resource_limit:sumux:pod_container_restarts:maxux:node_network_receive_drop_total:sum_irateux:node_network_transmit_drop_total:sum_irate
Windows 支援需要下列錄製規則。 它們會自動部署,但預設不會啟用。 請參閱 啟用和停用規則群組 以啟用它們。
ux:node_cpu_usage_windows:sum_irateux:node_memory_usage_windows:sumux:controller_cpu_usage_windows:sum_irateux:controller_workingset_memory_windows:sumux:pod_cpu_usage_windows:sum_irateux:pod_workingset_memory_windows:sum