使用容器深入解析將 Prometheus 計量傳送至 Log Analytics 工作區

發行項
10/15/2024

本文說明如何將 Prometheus 計量從由容器深入解析監視的 Kubernetes 叢集傳送至 Log Analytics 工作區。執行此設定之前，您應該先確定您使用適用於 Prometheus 的 Azure 監視器受管理服務從叢集擷取 Prometheus 計量，這是監視叢集的建議方法。只有當您也想要將此相同的資料傳送至 Log Analytics 工作區時，才能使用本文所述的設定，以便您使用記錄查詢和記錄搜尋警示分析資料。

此設定需要設定 Azure 監視器代理程式的「監視附加元件」，這是容器深入解析用於將資料傳送至 Log Analytics 工作區的相同附加元件。這需要透過匯出工具或 Pod 來公開 Prometheus 計量端點，然後為容器深入解析所使用的 Azure 監視器代理程式設定監視附加元件，如下圖所示。

Prometheus 擷取設定 (針對儲存為記錄的計量)

從下列兩個檢視方塊中的其中一個執行主動擷取 Prometheus 計量，並將計量傳送至設定的記錄分析工作區：

整個叢集：定義於 ConfigMap 區段的 [Prometheus data_collection_settings.cluster]。
整個節點：定義於 [ConfigMap] 區段 [Prometheus_data_collection_settings.node]。

端點	範圍	範例
Pod 註釋	整個叢集	`prometheus.io/scrape: "true"` `prometheus.io/path: "/mymetrics"` `prometheus.io/port: "8000"` `prometheus.io/scheme: "http"`
Kubernetes 服務	整個叢集	`http://my-service-dns.my-namespace:9100/metrics` `http://metrics-server.kube-system.svc.cluster.local/metrics`
URL/端點	每個節點和/或整個叢集	`http://myurl:9101/metrics`

指定 URL 時，Container Insights 只抓取端點。在指定 Kube 服務時，由叢集 DNS 伺服器來解析服務名稱以取得 IP 位址。然後抓取已解析的服務。

範圍	機碼	資料類型	值	Description
整個叢集				從下列三種方法中指定任何一種，以抓取端點的計量。
	`urls`	String	逗號分隔的陣列	HTTP 端點 (指定 IP 位址或有效的 URL 路徑)。例如： `urls=[$NODE_IP/metrics]` 。 ($NODE_IP 為特定的容器深入解析參數，並可用以代替節點 IP 位址。必須全部大寫。)
	`kubernetes_services`	String	逗號分隔的陣列	Kubernetes 服務的陣列，可從 kube-state-metrics 抓取計量。這裡必須使用完整網域名稱。例如：`kubernetes_services = ["http://metrics-server.kube-system.svc.cluster.local/metrics",http://my-service-dns.my-namespace.svc.cluster.local:9100/metrics]`
	`monitor_kubernetes_pods`	布林值	[True] 或 [False]	在整個叢集設定中設定為 `true` 時，容器深入解析代理程式將抓取整個叢集的 Kube Pod，以取得下列 Prometheus 註釋： `prometheus.io/scrape:` `prometheus.io/scheme:` `prometheus.io/path:` `prometheus.io/port:`
	`prometheus.io/scrape`	布林值	[True] 或 [False]	啟用 Pod 的抓取，而且 `monitor_kubernetes_pods` 必須設定為 `true`。
	`prometheus.io/scheme`	String	HTTP	預設為透過 HTTP 抓取。
	`prometheus.io/path`	String	逗號分隔的陣列	要從中擷取計量的 HTTP 資源路徑。若計量路徑不是 `/metrics`，請運用此註釋來定義。
	`prometheus.io/port`	String	9102	指定抓取的來源連接埠。若未設定連接埠，則其預設為 9102。
	`monitor_kubernetes_pods_namespaces`	String	逗號分隔的陣列	可從 Kubernetes Pod 抓取計量的命名空間允許清單。例如，`monitor_kubernetes_pods_namespaces = ["default1", "default2", "default3"]`
整個節點	`urls`	String	逗號分隔的陣列	HTTP 端點 (指定 IP 位址或有效的 URL 路徑)。例如： `urls=[$NODE_IP/metrics]` 。 ($NODE_IP 為特定的容器深入解析參數，並可用以代替節點 IP 位址。必須全部大寫。)
整個節點或整個叢集	`interval`	String	60 秒	收集間隔預設為 1 分鐘 (60 秒)。可以將 [prometheus_data_collection_settings.node] 和/或 [prometheus_data_collection_settings.cluster] 的收集修改為 s、m、h 等時間單位。
整個節點或整個叢集	`fieldpass` `fielddrop`	String	逗號分隔的陣列	您可以設定允許 (`fieldpass`) 和不允許 (`fielddrop`) 清單，以指定從端點收集或不收集特定的計量。您必須先設定允許清單。

設定 ConfigMaps 以指定 Prometheus 抓取設定 (針對儲存為記錄的計量)

執行下列步驟，以設定您叢集的 ConfigMap 設定檔。 ConfigMaps 是全域清單，只能有一個 ConfigMap 套用至代理程式。不能以另一個 ConfigMaps 來駁回收集。

下載 ConfigMap YAML 檔案範本，並將其儲存為 ontainer-azm-ms-agentconfig.yaml。若已將 ConfigMap 部署至叢集，但想要更新為較新的設定，則可以編輯先前使用的 ConfigMap 檔案。

以您的自訂來編輯 ConfigMap YAML 檔案，以抓取 Prometheus 計量。

若要收集整個叢集的 Kube 服務，請使用下列範例來設定 ConfigMap 檔案：

prometheus-data-collection-settings: |- 
# Custom Prometheus metrics data collection settings
[prometheus_data_collection_settings.cluster] 
interval = "1m"  ## Valid time units are s, m, h.
fieldpass = ["metric_to_pass1", "metric_to_pass12"] ## specify metrics to pass through 
fielddrop = ["metric_to_drop"] ## specify metrics to drop from collecting
kubernetes_services = ["http://my-service-dns.my-namespace:9102/metrics"]

若要在整個叢集中設定從特定 URL 抓取 Prometheus 計量，請使用下列範例來設定 ConfigMap 檔案：

prometheus-data-collection-settings: |- 
# Custom Prometheus metrics data collection settings
[prometheus_data_collection_settings.cluster] 
interval = "1m"  ## Valid time units are s, m, h.
fieldpass = ["metric_to_pass1", "metric_to_pass12"] ## specify metrics to pass through 
fielddrop = ["metric_to_drop"] ## specify metrics to drop from collecting
urls = ["http://myurl:9101/metrics"] ## An array of urls to scrape metrics from

若要針對叢集中每個節點，設定從代理程式的 DaemonSet 抓取 Prometheus 計量，請在 ConfigMap 中完成下列範例設定：

prometheus-data-collection-settings: |- 
# Custom Prometheus metrics data collection settings 
[prometheus_data_collection_settings.node] 
interval = "1m"  ## Valid time units are s, m, h. 
urls = ["http://$NODE_IP:9103/metrics"] 
fieldpass = ["metric_to_pass1", "metric_to_pass2"] 
fielddrop = ["metric_to_drop"]

$NODE_IP 為特定的 Container insights 參數，並可用以代替節點 IP 位址。必須全部大寫。

若要指定 Pod 註釋來設定抓取 Prometheus 計量：

在 ConfigMap 中，指定下列設定：

prometheus-data-collection-settings: |- 
# Custom Prometheus metrics data collection settings
[prometheus_data_collection_settings.cluster] 
interval = "1m"  ## Valid time units are s, m, h
monitor_kubernetes_pods = true

對 Pod 註釋指定下列設定：

- prometheus.io/scrape:"true" #Enable scraping for this pod 
- prometheus.io/scheme:"http" #If the metrics endpoint is secured then you will need to set this to `https`, if not default ‘http’
- prometheus.io/path:"/mymetrics" #If the metrics path is not /metrics, define it with this annotation. 
- prometheus.io/port:"8000" #If port is not 9102 use this annotation

若要限定監視特定命名空間中具有註釋的 Pod，例如，只包括生產環境工作負載專用的 Pod，請在 ConfigMap 中將 monitor_kubernetes_pod 設定為 true。然後新增命名空間篩選條件 monitor_kubernetes_pods_namespaces，以指定想從中抓取的命名空間。例如 monitor_kubernetes_pods_namespaces = ["default1", "default2", "default3"]。

執行下列 kubectl 命令：kubectl apply -f <configmap_yaml_file.yaml>。

範例：kubectl apply -f container-azm-ms-agentconfig.yaml。

可能需要幾分鐘的時間才能完成設定變更並生效。叢集中的所有 ama-logs Pod 將重新啟動。在重新啟動完成時，會顯示類似如下的訊息，並且包括 configmap "container-azm-ms-agentconfig" created 結果。

確認設定

若要驗證設定已成功套用至叢集，請使用下列命令來檢閱代理程式 Pod 的記錄：kubectl logs ama-logs-fdf58 -n=kube-system。

若 Azure 監視器代理程式 Pod 傳回設定錯誤，則輸出將顯示類似下列範例的錯誤：

***************Start Config Processing******************** 
config::unsupported/missing config schema version - 'v21' , using defaults

套用設定變更時發生的錯誤也可供檢閱。針對設定變更和抓取 Prometheus 計量，有下列選項可額外執行疑難排解：

使用相同kubectl logs命令從代理程式 Pod 記錄。

從即時資料。即時資料記錄會顯示類似下列範例的錯誤：

2019-07-08T18:55:00Z E! [inputs.prometheus]: Error in plugin: error making HTTP request to http://invalidurl:1010/metrics: Get http://invalidurl:1010/metrics: dial tcp: lookup invalidurl on 10.0.0.10:53: no such host

從 Log Analytics 工作區中的 KubeMonAgentEvents 資料表。每小時傳送一次資料，抓取錯誤顯示為「警告」嚴重性，設定錯誤顯示為「錯誤」嚴重性。如果沒有錯誤，資料表中的項目包含嚴重性為資訊的資料，其不會回報任何錯誤。 Tags 屬性包含發生錯誤的 Pod 和容器識別碼，還有過去一小時內第一次發生、最後一次發生和計數的詳細資訊。
針對 Azure Red Hat OpenShift v3.x 與 v4.x，請搜尋 ContainerLog 資料表來檢查 Azure 監視器代理程式記錄，以確認是否已啟用 openshift-azure-logging 的記錄集合。

錯誤會阻止 Azure 監視器代理程式剖析檔案，導致其重新啟動並使用預設設定。在 Azure Red Hat OpenShift v3.x 以外的叢集上更正 ConfigMap 中的錯誤之後，將 YAML 檔案儲存，並執行下列 kubectl apply -f <configmap_yaml_file.yaml 命令以套用已更新的 ConfigMaps。

若為 Azure Red Hat OpenShift v3.x，請執行下列 oc edit configmaps container-azm-ms-agentconfig -n openshift-azure-logging 命令以編輯並儲存已更新的 ConfigMaps。

查詢 Prometheus 計量資料

若要檢視 Azure 監視器抓取的 Prometheus 計量，以及代理程式回報的任何設定/抓取錯誤，請檢閱查詢 Prometheus 計量資料。

在 Grafana 中檢視 Prometheus 計量

容器深入解析支援在 Grafana 儀表板中，檢視 Log Analytics 工作區中儲存的計量。我們已提供範本，您可以從 Grafana 的儀表板存放庫進行下載。使用範本開始進行並當作參考，以協助您瞭解從已監視的叢集來將自訂 Grafana 儀表板視覺化，查詢其他資料。

共用方式為