コンテナー分析情報からのクエリログ

2025-04-23

Container insights により、コンテナーホストおよびコンテナーからパフォーマンスのメトリック、インベントリデータ、および正常性状態の情報が収集されます。データは 3 分ごとに収集され、Azure Monitor の Log Analytics ワークスペースに転送され、Azure Monitor の Log Analytics を使用してログ分析に使用できます。

このデータは、移行計画、容量の分析、探索、必要に応じたパフォーマンスのトラブルシューティングといったシナリオに適用できます。 Azure Monitor ログを使用することにより、傾向の特定、ボトルネックの診断、予想を行ったり、データを関連付けて現在のクラスター構成のパフォーマンスが最適化されているかどうかを判断したりできます。

これらのクエリの使用方法の詳細については、「Azure Monitor Log Analytics でのクエリの使用」を参照してください。 Log Analytics を使用してクエリを実行し、その結果を操作するための完全なチュートリアルについては、「Log Analytics チュートリアル」を参照してください。

重要

この記事のクエリは、Container Insights によって収集され、Log Analytics ワークスペースに保存されるデータに依存します。既定のデータ収集の設定を変更した場合、クエリは期待される結果を返さない可能性があります。特に、クラスターの Prometheus メトリックを有効にした後でパフォーマンスデータの収集を無効にした場合、Perf テーブルを使用するクエリは結果を返しません。

パフォーマンスデータ収集の無効化などの事前設定構成については、「データ収集ルールを使用して Container Insights でデータ収集を構成する」を参照してください。詳細なデータ収集オプションについては、「ConfigMap を使用して Container Insights でデータ収集を構成する」を参照してください。

Log Analytics を開く

Log Analytics を開始するには、複数のオプションがあります。各オプションは、異なるスコープで始まります。ワークスペース内のすべてのデータにアクセスするには、[監視] メニューから [ログ] を選択します。データを単一の Kubernetes クラスターに制限するには、そのクラスターのメニューから [ログ] を選択します。

既存のログクエリ

Log Analytics を使用するために、必ずしもログクエリの記述方法について理解している必要はありません。複数の事前構築済みクエリから選択できます。クエリは、変更せずに実行することも、カスタムクエリの開始点として使用することもできます。 Log Analytics 画面の上部にある [クエリ] を選択し、[Kubernetes Services] の [リソースの種類] でクエリを表示します。

コンテナーのテーブル

コンテナーの分析情報で使用されるテーブルの一覧と詳細な説明については、「Azure Monitor のテーブルリファレンス」を参照してください。これらのテーブルはすべて、ログクエリで使用できます。

ログクエリの例

多くの場合、1、2 個の例を使ってクエリを作成し、その後、要件に合わせて変更するとうまくいきます。より高度なクエリを作成できるように、次のサンプルクエリを試すことができます。

コンテナーのライフサイクル情報をすべて一覧表示します

ContainerInventory
| project Computer, Name, Image, ImageTag, ContainerState, CreatedTime, StartedTime, FinishedTime
| render table

Kubernetes イベント

注

既定では、collect_all_kube_events ConfigMap 設定が有効になっていない限り、種類が Normal のイベントは収集されないため、KubeEvents テーブルに対してクエリを実行してもそれらは表示されません。 Normal イベントを収集する必要がある場合、container-azm-ms-agentconfig ConfigMap で collect_all_kube_events 設定を有効にします。 ConfigMap の構成方法について詳しくは、「Container insights のエージェントデータ収集を構成する」を参照してください。

KubeEvents
| where not(isempty(Namespace))
| sort by TimeGenerated desc
| render table

コンテナーの CPU

Perf
| where ObjectName == "K8SContainer" and CounterName == "cpuUsageNanoCores" 
| summarize AvgCPUUsageNanoCores = avg(CounterValue) by bin(TimeGenerated, 30m), InstanceName

コンテナメモリ

このクエリでは、 Linux ノードでのみ使用できる memoryRssBytes が使用されます。

Perf
| where ObjectName == "K8SContainer" and CounterName == "memoryRssBytes"
| summarize AvgUsedRssMemoryBytes = avg(CounterValue) by bin(TimeGenerated, 30m), InstanceName

カスタムメトリックでの 1 分あたりの要求数

InsightsMetrics
| where Name == "requests_count"
| summarize Val=any(Val) by TimeGenerated=bin(TimeGenerated, 1m)
| sort by TimeGenerated asc
| project RequestsPerMinute = Val - prev(Val), TimeGenerated
| render barchart

名前と名前空間別のポッド

let startTimestamp = ago(1h);
KubePodInventory
| where TimeGenerated > startTimestamp
| project ContainerID, PodName=Name, Namespace
| where PodName contains "name" and Namespace startswith "namespace"
| distinct ContainerID, PodName
| join
(
    ContainerLog
    | where TimeGenerated > startTimestamp
)
on ContainerID
// at this point before the next pipe, columns from both tables are available to be "projected". Due to both
// tables having a "Name" column, we assign an alias as PodName to one column which we actually want
| project TimeGenerated, PodName, LogEntry, LogEntrySource
| summarize by TimeGenerated, LogEntry
| order by TimeGenerated desc

ポッドのスケールアウト (HPA)

このクエリは、各デプロイでスケールアウトされたレプリカの数を返します。 HPA で構成されたレプリカの最大数を使用してスケールアウトの割合を計算します。

let _minthreshold = 70; // minimum threshold goes here if you want to setup as an alert
let _maxthreshold = 90; // maximum threshold goes here if you want to setup as an alert
let startDateTime = ago(60m);
KubePodInventory
| where TimeGenerated >= startDateTime 
| where Namespace !in('default', 'kube-system') // List of non system namespace filter goes here.
| extend labels = todynamic(PodLabel)
| extend deployment_hpa = reverse(substring(reverse(ControllerName), indexof(reverse(ControllerName), "-") + 1))
| distinct tostring(deployment_hpa)
| join kind=inner (InsightsMetrics 
    | where TimeGenerated > startDateTime 
    | where Name == 'kube_hpa_status_current_replicas'
    | extend pTags = todynamic(Tags) //parse the tags for values
    | extend ns = todynamic(pTags.k8sNamespace) //parse namespace value from tags
    | extend deployment_hpa = todynamic(pTags.targetName) //parse HPA target name from tags
    | extend max_reps = todynamic(pTags.spec_max_replicas) // Parse maximum replica settings from HPA deployment
    | extend desired_reps = todynamic(pTags.status_desired_replicas) // Parse desired replica settings from HPA deployment
    | summarize arg_max(TimeGenerated, *) by tostring(ns), tostring(deployment_hpa), Cluster=toupper(tostring(split(_ResourceId, '/')[8])), toint(desired_reps), toint(max_reps), scale_out_percentage=(desired_reps * 100 / max_reps)
    //| where scale_out_percentage > _minthreshold and scale_out_percentage <= _maxthreshold
    )
    on deployment_hpa

ノードプールのスケールアウト

このクエリは、各ノードプール内のアクティブなノードの数を返します。自動スケーラー設定で使用可能なアクティブノードの数と最大ノード構成を計算して、スケールアウトの割合を決定します。 [結果の数] のアラートルールに使用するには、クエリのコメント行を参照してください。

let nodepoolMaxnodeCount = 10; // the maximum number of nodes in your auto scale setting goes here.
let _minthreshold = 20;
let _maxthreshold = 90;
let startDateTime = 60m;
KubeNodeInventory
| where TimeGenerated >= ago(startDateTime)
| extend nodepoolType = todynamic(Labels) //Parse the labels to get the list of node pool types
| extend nodepoolName = todynamic(nodepoolType[0].agentpool) // parse the label to get the nodepool name or set the specific nodepool name (like nodepoolName = 'agentpool)'
| summarize nodeCount = count(Computer) by ClusterName, tostring(nodepoolName), TimeGenerated
//(Uncomment the below two lines to set this as a log search alert)
//| extend scaledpercent = iff(((nodeCount * 100 / nodepoolMaxnodeCount) >= _minthreshold and (nodeCount * 100 / nodepoolMaxnodeCount) < _maxthreshold), "warn", "normal")
//| where scaledpercent == 'warn'
| summarize arg_max(TimeGenerated, *) by nodeCount, ClusterName, tostring(nodepoolName)
| project ClusterName, 
    TotalNodeCount= strcat("Total Node Count: ", nodeCount),
    ScaledOutPercentage = (nodeCount * 100 / nodepoolMaxnodeCount),  
    TimeGenerated, 
    nodepoolName

システムコンテナー (レプリカセット) の可用性

このクエリは、システムコンテナー (レプリカセット) を返し、使用できない割合をレポートします。 [結果の数] のアラートルールに使用するには、クエリのコメント行を参照してください。

let startDateTime = 5m; // the minimum time interval goes here
let _minalertThreshold = 50; //Threshold for minimum and maximum unavailable or not running containers
let _maxalertThreshold = 70;
KubePodInventory
| where TimeGenerated >= ago(startDateTime)
| distinct ClusterName, TimeGenerated
| summarize Clustersnapshot = count() by ClusterName
| join kind=inner (
    KubePodInventory
    | where TimeGenerated >= ago(startDateTime)
    | where Namespace in('default', 'kube-system') and ControllerKind == 'ReplicaSet' // the system namespace filter goes here
    | distinct ClusterName, Computer, PodUid, TimeGenerated, PodStatus, ServiceName, PodLabel, Namespace, ContainerStatus
    | summarize arg_max(TimeGenerated, *), TotalPODCount = count(), podCount = sumif(1, PodStatus == 'Running' or PodStatus != 'Running'), containerNotrunning = sumif(1, ContainerStatus != 'running')
        by ClusterName, TimeGenerated, ServiceName, PodLabel, Namespace
    )
    on ClusterName
| project ClusterName, ServiceName, podCount, containerNotrunning, containerNotrunningPercent = (containerNotrunning * 100 / podCount), TimeGenerated, PodStatus, PodLabel, Namespace, Environment = tostring(split(ClusterName, '-')[3]), Location = tostring(split(ClusterName, '-')[4]), ContainerStatus
//Uncomment the below line to set for automated alert
//| where PodStatus == "Running" and containerNotrunningPercent > _minalertThreshold and containerNotrunningPercent < _maxalertThreshold
| summarize arg_max(TimeGenerated, *), c_entry=count() by PodLabel, ServiceName, ClusterName
//Below lines are to parse the labels to identify the impacted service/component name
| extend parseLabel = replace(@'k8s-app', @'k8sapp', PodLabel)
| extend parseLabel = replace(@'app.kubernetes.io\\/component', @'appkubernetesiocomponent', parseLabel)
| extend parseLabel = replace(@'app.kubernetes.io\\/instance', @'appkubernetesioinstance', parseLabel)
| extend tags = todynamic(parseLabel)
| extend tag01 = todynamic(tags[0].app)
| extend tag02 = todynamic(tags[0].k8sapp)
| extend tag03 = todynamic(tags[0].appkubernetesiocomponent)
| extend tag04 = todynamic(tags[0].aadpodidbinding)
| extend tag05 = todynamic(tags[0].appkubernetesioinstance)
| extend tag06 = todynamic(tags[0].component)
| project ClusterName, TimeGenerated,
    ServiceName = strcat( ServiceName, tag01, tag02, tag03, tag04, tag05, tag06),
    ContainerUnavailable = strcat("Unavailable Percentage: ", containerNotrunningPercent),
    PodStatus = strcat("PodStatus: ", PodStatus), 
    ContainerStatus = strcat("Container Status: ", ContainerStatus)

システムコンテナー (デーモンセット) の可用性

このクエリは、システムコンテナー (デーモンセット) を返し、使用できない割合をレポートします。 [結果の数] のアラートルールに使用するには、クエリのコメント行を参照してください。

let startDateTime = 5m; // the minimum time interval goes here
let _minalertThreshold = 50; //Threshold for minimum and maximum unavailable or not running containers
let _maxalertThreshold = 70;
KubePodInventory
| where TimeGenerated >= ago(startDateTime)
| distinct ClusterName, TimeGenerated
| summarize Clustersnapshot = count() by ClusterName
| join kind=inner (
    KubePodInventory
    | where TimeGenerated >= ago(startDateTime)
    | where Namespace in('default', 'kube-system') and ControllerKind == 'DaemonSet' // the system namespace filter goes here
    | distinct ClusterName, Computer, PodUid, TimeGenerated, PodStatus, ServiceName, PodLabel, Namespace, ContainerStatus
    | summarize arg_max(TimeGenerated, *), TotalPODCount = count(), podCount = sumif(1, PodStatus == 'Running' or PodStatus != 'Running'), containerNotrunning = sumif(1, ContainerStatus != 'running')
        by ClusterName, TimeGenerated, ServiceName, PodLabel, Namespace
    )
    on ClusterName
| project ClusterName, ServiceName, podCount, containerNotrunning, containerNotrunningPercent = (containerNotrunning * 100 / podCount), TimeGenerated, PodStatus, PodLabel, Namespace, Environment = tostring(split(ClusterName, '-')[3]), Location = tostring(split(ClusterName, '-')[4]), ContainerStatus
//Uncomment the below line to set for automated alert
//| where PodStatus == "Running" and containerNotrunningPercent > _minalertThreshold and containerNotrunningPercent < _maxalertThreshold
| summarize arg_max(TimeGenerated, *), c_entry=count() by PodLabel, ServiceName, ClusterName
//Below lines are to parse the labels to identify the impacted service/component name
| extend parseLabel = replace(@'k8s-app', @'k8sapp', PodLabel)
| extend parseLabel = replace(@'app.kubernetes.io\\/component', @'appkubernetesiocomponent', parseLabel)
| extend parseLabel = replace(@'app.kubernetes.io\\/instance', @'appkubernetesioinstance', parseLabel)
| extend tags = todynamic(parseLabel)
| extend tag01 = todynamic(tags[0].app)
| extend tag02 = todynamic(tags[0].k8sapp)
| extend tag03 = todynamic(tags[0].appkubernetesiocomponent)
| extend tag04 = todynamic(tags[0].aadpodidbinding)
| extend tag05 = todynamic(tags[0].appkubernetesioinstance)
| extend tag06 = todynamic(tags[0].component)
| project ClusterName, TimeGenerated,
    ServiceName = strcat( ServiceName, tag01, tag02, tag03, tag04, tag05, tag06),
    ContainerUnavailable = strcat("Unavailable Percentage: ", containerNotrunningPercent),
    PodStatus = strcat("PodStatus: ", PodStatus), 
    ContainerStatus = strcat("Container Status: ", ContainerStatus)

コンテナーのログ記録

AKS のコンテナーログは、ContainerLogV2 テーブルに格納されます。次のサンプルクエリを実行することで、ターゲットポッド、デプロイ、または名前空間から stderr/stdout ログ出力を検索することができます。

特定のポッド、名前空間、コンテナーに関するコンテナーログ

ContainerLogV2
| where _ResourceId =~ "clusterResourceID" //update with resource ID
| where PodNamespace == "podNameSpace" //update with target namespace
| where PodName == "podName" //update with target pod
| where ContainerName == "containerName" //update with target container
| project TimeGenerated, Computer, ContainerId, LogMessage, LogSource

特定のデプロイのコンテナーログ

let KubePodInv = KubePodInventory
| where _ResourceId =~ "clusterResourceID" //update with resource ID
| where Namespace == "deploymentNamespace" //update with target namespace
| where ControllerKind == "ReplicaSet"
| extend deployment = reverse(substring(reverse(ControllerName), indexof(reverse(ControllerName), "-") + 1))
| where deployment == "deploymentName" //update with target deployment
| extend ContainerId = ContainerID
| summarize arg_max(TimeGenerated, *)  by deployment, ContainerId, PodStatus, ContainerStatus
| project deployment, ContainerId, PodStatus, ContainerStatus;

KubePodInv
| join
(
    ContainerLogV2
  | where TimeGenerated >= startTime and TimeGenerated < endTime
  | where PodNamespace == "deploymentNamespace" //update with target namespace
  | where PodName startswith "deploymentName" //update with target deployment
) on ContainerId
| project TimeGenerated, deployment, PodName, PodStatus, ContainerName, ContainerId, ContainerStatus, LogMessage, LogSource

特定の名前空間で失敗したポッドに関するコンテナーログ

    let KubePodInv = KubePodInventory
    | where TimeGenerated >= startTime and TimeGenerated < endTime
    | where _ResourceId =~ "clustereResourceID" //update with resource ID
    | where Namespace == "podNamespace" //update with target namespace
    | where PodStatus == "Failed"
    | extend ContainerId = ContainerID
    | summarize arg_max(TimeGenerated, *)  by  ContainerId, PodStatus, ContainerStatus
    | project ContainerId, PodStatus, ContainerStatus;

    KubePodInv
    | join
    (
        ContainerLogV2
    | where TimeGenerated >= startTime and TimeGenerated < endTime
    | where PodNamespace == "podNamespace" //update with target namespace
    ) on ContainerId
    | project TimeGenerated, PodName, PodStatus, ContainerName, ContainerId, ContainerStatus, LogMessage, LogSource

コンテナーの分析情報の既定の視覚化クエリ

これらのクエリは、コンテナーの分析情報のすぐに使える視覚化から生成されます。既定のグラフの代わりに、カスタムコスト最適化設定を有効にしている場合は、これらを使用できます。

状態別のノード数

このグラフに必要なテーブルには、KubeNodeInventory があります。

 let trendBinSize = 5m;
 let maxListSize = 1000;
 let clusterId = 'clusterResourceID'; //update with resource ID
 
 let rawData = KubeNodeInventory 
| where ClusterId =~ clusterId 
| distinct ClusterId, TimeGenerated 
| summarize ClusterSnapshotCount = count() by Timestamp = bin(TimeGenerated, trendBinSize), ClusterId 
| join hint.strategy=broadcast ( KubeNodeInventory 
| where ClusterId =~ clusterId 
| summarize TotalCount = count(), ReadyCount = sumif(1, Status contains ('Ready')) by ClusterId, Timestamp = bin(TimeGenerated, trendBinSize) 
| extend NotReadyCount = TotalCount - ReadyCount ) on ClusterId, Timestamp 
| project ClusterId, Timestamp, TotalCount = todouble(TotalCount) / ClusterSnapshotCount, ReadyCount = todouble(ReadyCount) / ClusterSnapshotCount, NotReadyCount = todouble(NotReadyCount) / ClusterSnapshotCount;

 rawData 
| order by Timestamp asc 
| summarize makelist(Timestamp, maxListSize), makelist(TotalCount, maxListSize), makelist(ReadyCount, maxListSize), makelist(NotReadyCount, maxListSize) by ClusterId 
| join ( rawData 
| summarize Avg_TotalCount = avg(TotalCount), Avg_ReadyCount = avg(ReadyCount), Avg_NotReadyCount = avg(NotReadyCount) by ClusterId ) on ClusterId 
| project ClusterId, Avg_TotalCount, Avg_ReadyCount, Avg_NotReadyCount, list_Timestamp, list_TotalCount, list_ReadyCount, list_NotReadyCount

状態別のポッド数

このグラフに必要なテーブルには、KubePodInventory があります。

 let trendBinSize = 5m;
 let maxListSize = 1000;
 let clusterId = 'clusterResourceID'; //update with resource ID
 
 let rawData = KubePodInventory 
| where ClusterId =~ clusterId 
| distinct ClusterId, TimeGenerated 
| summarize ClusterSnapshotCount = count() by bin(TimeGenerated, trendBinSize), ClusterId 
| join hint.strategy=broadcast ( KubePodInventory 
| where ClusterId =~ clusterId 
| summarize PodStatus=any(PodStatus) by TimeGenerated, PodUid, ClusterId 
| summarize TotalCount = count(), PendingCount = sumif(1, PodStatus =~ 'Pending'), RunningCount = sumif(1, PodStatus =~ 'Running'), SucceededCount = sumif(1, PodStatus =~ 'Succeeded'), FailedCount = sumif(1, PodStatus =~ 'Failed'), TerminatingCount = sumif(1, PodStatus =~ 'Terminating') by ClusterId, bin(TimeGenerated, trendBinSize) ) on ClusterId, TimeGenerated 
| extend UnknownCount = TotalCount - PendingCount - RunningCount - SucceededCount - FailedCount - TerminatingCount 
| project ClusterId, Timestamp = TimeGenerated, TotalCount = todouble(TotalCount) / ClusterSnapshotCount, PendingCount = todouble(PendingCount) / ClusterSnapshotCount, RunningCount = todouble(RunningCount) / ClusterSnapshotCount, SucceededCount = todouble(SucceededCount) / ClusterSnapshotCount, FailedCount = todouble(FailedCount) / ClusterSnapshotCount, TerminatingCount = todouble(TerminatingCount) / ClusterSnapshotCount, UnknownCount = todouble(UnknownCount) / ClusterSnapshotCount;

 let rawDataCached = rawData;
 
 rawDataCached 
| order by Timestamp asc 
| summarize makelist(Timestamp, maxListSize), makelist(TotalCount, maxListSize), makelist(PendingCount, maxListSize), makelist(RunningCount, maxListSize), makelist(SucceededCount, maxListSize), makelist(FailedCount, maxListSize), makelist(TerminatingCount, maxListSize), makelist(UnknownCount, maxListSize) by ClusterId 
| join ( rawDataCached 
| summarize Avg_TotalCount = avg(TotalCount), Avg_PendingCount = avg(PendingCount), Avg_RunningCount = avg(RunningCount), Avg_SucceededCount = avg(SucceededCount), Avg_FailedCount = avg(FailedCount), Avg_TerminatingCount = avg(TerminatingCount), Avg_UnknownCount = avg(UnknownCount) by ClusterId ) on ClusterId 
| project ClusterId, Avg_TotalCount, Avg_PendingCount, Avg_RunningCount, Avg_SucceededCount, Avg_FailedCount, Avg_TerminatingCount, Avg_UnknownCount, list_Timestamp, list_TotalCount, list_PendingCount, list_RunningCount, list_SucceededCount, list_FailedCount, list_TerminatingCount, list_UnknownCount

状態別のコンテナー一覧

このグラフに必要なテーブルには、KubePodInventory と Perf があります。

 let startDateTime = datetime('start time');
 let endDateTime = datetime('end time');
 let trendBinSize = 15m;
 let maxResultCount = 10000;
 let metricUsageCounterName = 'cpuUsageNanoCores';
 let metricLimitCounterName = 'cpuLimitNanoCores';
 
 let KubePodInventoryTable = KubePodInventory 
| where TimeGenerated >= startDateTime 
| where TimeGenerated < endDateTime 
| where isnotempty(ClusterName) 
| where isnotempty(Namespace) 
| where isnotempty(Computer) 
| project TimeGenerated, ClusterId, ClusterName, Namespace, ServiceName, ControllerName, Node = Computer, Pod = Name, ContainerInstance = ContainerName, ContainerID, ReadySinceNow = format_timespan(endDateTime - ContainerCreationTimeStamp , 'ddd.hh:mm:ss.fff'), Restarts = ContainerRestartCount, Status = ContainerStatus, ContainerStatusReason = columnifexists('ContainerStatusReason', ''), ControllerKind = ControllerKind, PodStatus;

 let startRestart = KubePodInventoryTable 
| summarize arg_min(TimeGenerated, *) by Node, ContainerInstance 
| where ClusterId =~ 'clusterResourceID' //update with resource ID
| project Node, ContainerInstance, InstanceName = strcat(ClusterId, '/', ContainerInstance), StartRestart = Restarts;

 let IdentityTable = KubePodInventoryTable 
| summarize arg_max(TimeGenerated, *) by Node, ContainerInstance 
| where ClusterId =~ 'clusterResourceID' //update with resource ID
| project ClusterName, Namespace, ServiceName, ControllerName, Node, Pod, ContainerInstance, InstanceName = strcat(ClusterId, '/', ContainerInstance), ContainerID, ReadySinceNow, Restarts, Status = iff(Status =~ 'running', 0, iff(Status=~'waiting', 1, iff(Status =~'terminated', 2, 3))), ContainerStatusReason, ControllerKind, Containers = 1, ContainerName = tostring(split(ContainerInstance, '/')[1]), PodStatus, LastPodInventoryTimeGenerated = TimeGenerated, ClusterId;

 let CachedIdentityTable = IdentityTable;
 
 let FilteredPerfTable = Perf 
| where TimeGenerated >= startDateTime 
| where TimeGenerated < endDateTime 
| where ObjectName == 'K8SContainer' 
| where InstanceName startswith 'clusterResourceID' 
| project Node = Computer, TimeGenerated, CounterName, CounterValue, InstanceName ;

 let CachedFilteredPerfTable = FilteredPerfTable;
 
 let LimitsTable = CachedFilteredPerfTable 
| where CounterName =~ metricLimitCounterName 
| summarize arg_max(TimeGenerated, *) by Node, InstanceName 
| project Node, InstanceName, LimitsValue = iff(CounterName =~ 'cpuLimitNanoCores', CounterValue/1000000, CounterValue), TimeGenerated;
 let MetaDataTable = CachedIdentityTable 
| join kind=leftouter ( LimitsTable ) on Node, InstanceName 
| join kind= leftouter ( startRestart ) on Node, InstanceName 
| project ClusterName, Namespace, ServiceName, ControllerName, Node, Pod, InstanceName, ContainerID, ReadySinceNow, Restarts, LimitsValue, Status, ContainerStatusReason = columnifexists('ContainerStatusReason', ''), ControllerKind, Containers, ContainerName, ContainerInstance, StartRestart, PodStatus, LastPodInventoryTimeGenerated, ClusterId;

 let UsagePerfTable = CachedFilteredPerfTable 
| where CounterName =~ metricUsageCounterName 
| project TimeGenerated, Node, InstanceName, CounterValue = iff(CounterName =~ 'cpuUsageNanoCores', CounterValue/1000000, CounterValue);

 let LastRestartPerfTable = CachedFilteredPerfTable 
| where CounterName =~ 'restartTimeEpoch' 
| summarize arg_max(TimeGenerated, *) by Node, InstanceName 
| project Node, InstanceName, UpTime = CounterValue, LastReported = TimeGenerated;

 let AggregationTable = UsagePerfTable 
| summarize Aggregation = max(CounterValue) by Node, InstanceName 
| project Node, InstanceName, Aggregation;

 let TrendTable = UsagePerfTable 
| summarize TrendAggregation = max(CounterValue) by bin(TimeGenerated, trendBinSize), Node, InstanceName 
| project TrendTimeGenerated = TimeGenerated, Node, InstanceName , TrendAggregation 
| summarize TrendList = makelist(pack("timestamp", TrendTimeGenerated, "value", TrendAggregation)) by Node, InstanceName;

 let containerFinalTable = MetaDataTable 
| join kind= leftouter( AggregationTable ) on Node, InstanceName 
| join kind = leftouter (LastRestartPerfTable) on Node, InstanceName 
| order by Aggregation desc, ContainerName 
| join kind = leftouter ( TrendTable) on Node, InstanceName 
| extend ContainerIdentity = strcat(ContainerName, ' ', Pod) 
| project ContainerIdentity, Status, ContainerStatusReason = columnifexists('ContainerStatusReason', ''), Aggregation, Node, Restarts, ReadySinceNow, TrendList = iif(isempty(TrendList), parse_json('[]'), TrendList), LimitsValue, ControllerName, ControllerKind, ContainerID, Containers, UpTimeNow = datetime_diff('Millisecond', endDateTime, datetime_add('second', toint(UpTime), make_datetime(1970,1,1))), ContainerInstance, StartRestart, LastReportedDelta = datetime_diff('Millisecond', endDateTime, LastReported), PodStatus, InstanceName, Namespace, LastPodInventoryTimeGenerated, ClusterId;
containerFinalTable 
| limit 200

状態別のコントローラー一覧

このグラフに必要なテーブルには、KubePodInventory と Perf があります。

 let endDateTime = datetime('start time');
 let startDateTime = datetime('end time');
 let trendBinSize = 15m;
 let metricLimitCounterName = 'cpuLimitNanoCores';
 let metricUsageCounterName = 'cpuUsageNanoCores';
 
 let primaryInventory = KubePodInventory 
| where TimeGenerated >= startDateTime 
| where TimeGenerated < endDateTime 
| where isnotempty(ClusterName) 
| where isnotempty(Namespace) 
| extend Node = Computer 
| where ClusterId =~ 'clusterResourceID' //update with resource ID
| project TimeGenerated, ClusterId, ClusterName, Namespace, ServiceName, Node = Computer, ControllerName, Pod = Name, ContainerInstance = ContainerName, ContainerID, InstanceName, PerfJoinKey = strcat(ClusterId, '/', ContainerName), ReadySinceNow = format_timespan(endDateTime - ContainerCreationTimeStamp, 'ddd.hh:mm:ss.fff'), Restarts = ContainerRestartCount, Status = ContainerStatus, ContainerStatusReason = columnifexists('ContainerStatusReason', ''), ControllerKind = ControllerKind, PodStatus, ControllerId = strcat(ClusterId, '/', Namespace, '/', ControllerName);

let podStatusRollup = primaryInventory 
| summarize arg_max(TimeGenerated, *) by Pod 
| project ControllerId, PodStatus, TimeGenerated 
| summarize count() by ControllerId, PodStatus = iif(TimeGenerated < ago(30m), 'Unknown', PodStatus) 
| summarize PodStatusList = makelist(pack('Status', PodStatus, 'Count', count_)) by ControllerId;

let latestContainersByController = primaryInventory 
| where isnotempty(Node) 
| summarize arg_max(TimeGenerated, *) by PerfJoinKey 
| project ControllerId, PerfJoinKey;

let filteredPerformance = Perf 
| where TimeGenerated >= startDateTime 
| where TimeGenerated < endDateTime 
| where ObjectName == 'K8SContainer' 
| where InstanceName startswith 'clusterResourceID' //update with resource ID
| project TimeGenerated, CounterName, CounterValue, InstanceName, Node = Computer ;

let metricByController = filteredPerformance 
| where CounterName =~ metricUsageCounterName 
| extend PerfJoinKey = InstanceName 
| summarize Value = percentile(CounterValue, 95) by PerfJoinKey, CounterName 
| join (latestContainersByController) on PerfJoinKey 
| summarize Value = sum(Value) by ControllerId, CounterName 
| project ControllerId, CounterName, AggregationValue = iff(CounterName =~ 'cpuUsageNanoCores', Value/1000000, Value);

let containerCountByController = latestContainersByController 
| summarize ContainerCount = count() by ControllerId;

let restartCountsByController = primaryInventory 
| summarize Restarts = max(Restarts) by ControllerId;

let oldestRestart = primaryInventory 
| summarize ReadySinceNow = min(ReadySinceNow) by ControllerId;

let trendLineByController = filteredPerformance 
| where CounterName =~ metricUsageCounterName 
| extend PerfJoinKey = InstanceName 
| summarize Value = percentile(CounterValue, 95) by bin(TimeGenerated, trendBinSize), PerfJoinKey, CounterName 
| order by TimeGenerated asc 
| join kind=leftouter (latestContainersByController) on PerfJoinKey 
| summarize Value=sum(Value) by ControllerId, TimeGenerated, CounterName 
| project TimeGenerated, Value = iff(CounterName =~ 'cpuUsageNanoCores', Value/1000000, Value), ControllerId 
| summarize TrendList = makelist(pack("timestamp", TimeGenerated, "value", Value)) by ControllerId;

let latestLimit = filteredPerformance 
| where CounterName =~ metricLimitCounterName 
| extend PerfJoinKey = InstanceName 
| summarize arg_max(TimeGenerated, *) by PerfJoinKey 
| join kind=leftouter (latestContainersByController) on PerfJoinKey 
| summarize Value = sum(CounterValue) by ControllerId, CounterName 
| project ControllerId, LimitValue = iff(CounterName =~ 'cpuLimitNanoCores', Value/1000000, Value);

let latestTimeGeneratedByController = primaryInventory 
| summarize arg_max(TimeGenerated, *) by ControllerId 
| project ControllerId, LastTimeGenerated = TimeGenerated;

primaryInventory 
| distinct ControllerId, ControllerName, ControllerKind, Namespace 
| join kind=leftouter (podStatusRollup) on ControllerId 
| join kind=leftouter (metricByController) on ControllerId 
| join kind=leftouter (containerCountByController) on ControllerId 
| join kind=leftouter (restartCountsByController) on ControllerId 
| join kind=leftouter (oldestRestart) on ControllerId 
| join kind=leftouter (trendLineByController) on ControllerId 
| join kind=leftouter (latestLimit) on ControllerId 
| join kind=leftouter (latestTimeGeneratedByController) on ControllerId 
| project ControllerId, ControllerName, ControllerKind, PodStatusList, AggregationValue, ContainerCount = iif(isempty(ContainerCount), 0, ContainerCount), Restarts, ReadySinceNow, Node = '-', TrendList, LimitValue, LastTimeGenerated, Namespace 
| limit 250;

状態別のノード一覧

このグラフに必要なテーブルには、KubeNodeInventory、KubePodInventory、Perf があります。

 let endDateTime = datetime('start time');
 let startDateTime = datetime('end time');
 let binSize = 15m;
 let limitMetricName = 'cpuCapacityNanoCores';
 let usedMetricName = 'cpuUsageNanoCores'; 
 
 let materializedNodeInventory = KubeNodeInventory 
| where TimeGenerated < endDateTime 
| where TimeGenerated >= startDateTime 
| project ClusterName, ClusterId, Node = Computer, TimeGenerated, Status, NodeName = Computer, NodeId = strcat(ClusterId, '/', Computer), Labels 
| where ClusterId =~ 'clusterResourceID'; //update with resource ID

 let materializedPerf = Perf 
| where TimeGenerated < endDateTime 
| where TimeGenerated >= startDateTime 
| where ObjectName == 'K8SNode' 
| extend NodeId = InstanceName;

 let materializedPodInventory = KubePodInventory 
| where TimeGenerated < endDateTime 
| where TimeGenerated >= startDateTime 
| where isnotempty(ClusterName) 
| where isnotempty(Namespace) 
| where ClusterId =~ 'clusterResourceID'; //update with resource ID

 let inventoryOfCluster = materializedNodeInventory 
| summarize arg_max(TimeGenerated, Status) by ClusterName, ClusterId, NodeName, NodeId;

 let labelsByNode = materializedNodeInventory 
| summarize arg_max(TimeGenerated, Labels) by ClusterName, ClusterId, NodeName, NodeId;

 let countainerCountByNode = materializedPodInventory 
| project ContainerName, NodeId = strcat(ClusterId, '/', Computer) 
| distinct NodeId, ContainerName 
| summarize ContainerCount = count() by NodeId;

 let latestUptime = materializedPerf 
| where CounterName == 'restartTimeEpoch' 
| summarize arg_max(TimeGenerated, CounterValue) by NodeId 
| extend UpTimeMs = datetime_diff('Millisecond', endDateTime, datetime_add('second', toint(CounterValue), make_datetime(1970,1,1))) 
| project NodeId, UpTimeMs;

 let latestLimitOfNodes = materializedPerf 
| where CounterName == limitMetricName 
| summarize CounterValue = max(CounterValue) by NodeId 
| project NodeId, LimitValue = CounterValue;

 let actualUsageAggregated = materializedPerf 
| where CounterName == usedMetricName 
| summarize Aggregation = percentile(CounterValue, 95) by NodeId //This line updates to the desired aggregation
| project NodeId, Aggregation;

 let aggregateTrendsOverTime = materializedPerf 
| where CounterName == usedMetricName 
| summarize TrendAggregation = percentile(CounterValue, 95) by NodeId, bin(TimeGenerated, binSize) //This line updates to the desired aggregation
| project NodeId, TrendAggregation, TrendDateTime = TimeGenerated;

 let unscheduledPods = materializedPodInventory 
| where isempty(Computer) 
| extend Node = Computer 
| where isempty(ContainerStatus) 
| where PodStatus == 'Pending' 
| order by TimeGenerated desc 
| take 1 
| project ClusterName, NodeName = 'unscheduled', LastReceivedDateTime = TimeGenerated, Status = 'unscheduled', ContainerCount = 0, UpTimeMs = '0', Aggregation = '0', LimitValue = '0', ClusterId;

 let scheduledPods = inventoryOfCluster 
| join kind=leftouter (aggregateTrendsOverTime) on NodeId 
| extend TrendPoint = pack("TrendTime", TrendDateTime, "TrendAggregation", TrendAggregation) 
| summarize make_list(TrendPoint) by NodeId, NodeName, Status 
| join kind=leftouter (labelsByNode) on NodeId 
| join kind=leftouter (countainerCountByNode) on NodeId 
| join kind=leftouter (latestUptime) on NodeId 
| join kind=leftouter (latestLimitOfNodes) on NodeId 
| join kind=leftouter (actualUsageAggregated) on NodeId 
| project ClusterName, NodeName, ClusterId, list_TrendPoint, LastReceivedDateTime = TimeGenerated, Status, ContainerCount, UpTimeMs, Aggregation, LimitValue, Labels 
| limit 250;

 union (scheduledPods), (unscheduledPods) 
| project ClusterName, NodeName, LastReceivedDateTime, Status, ContainerCount, UpTimeMs = UpTimeMs_long, Aggregation = Aggregation_real, LimitValue = LimitValue_real, list_TrendPoint, Labels, ClusterId

Prometheus のメトリック

次の例は、「Container insights を使用して Prometheus メトリックを Log Analytics ワークスペースに送信する」で説明されている構成を必要とします。

Azure Monitor によってスクレイピングされ、名前空間でフィルター処理された Prometheus メトリックを表示するには、"prometheus" を指定します。 defaultkubernetes 名前空間から Prometheus メトリックを表示するクエリの例を次に示します。

InsightsMetrics 
| where Namespace contains "prometheus"
| extend tags=parse_json(Tags)
| summarize count() by Name

prometheus のデータは、名前で直接照会することもできます。

InsightsMetrics 
| where Namespace contains "prometheus"
| where Name contains "some_prometheus_metric"

各メトリックサイズのインジェストボリューム (1 日あたりの GB) を取得して、高いかどうかを把握できるよう、次のクエリが用意されています。

InsightsMetrics
| where Namespace contains "prometheus"
| where TimeGenerated > ago(24h)
| summarize VolumeInGB = (sum(_BilledSize) / (1024 * 1024 * 1024)) by Name
| order by VolumeInGB desc
| render barchart

出力では、次の例のような結果が示されます。

1 か月間の各メトリックサイズ (GB 単位) を推定し、ワークスペースで受け取る取り込まれたデータの量が多いかどうかを把握するため、次のクエリが用意されています。

InsightsMetrics
| where Namespace contains "prometheus"
| where TimeGenerated > ago(24h)
| summarize EstimatedGBPer30dayMonth = (sum(_BilledSize) / (1024 * 1024 * 1024)) * 30 by Name
| order by EstimatedGBPer30dayMonth desc
| render barchart

出力では、次の例のような結果が示されます。

設定またはスクレイピングエラー

構成エラーまたはスクラップエラーを調査するために、次のクエリの例では、KubeMonAgentEvents テーブルから情報イベントが返されます。

KubeMonAgentEvents | where Level != "Info"

出力では、次の例のような結果が示されます。

よく寄せられる質問

このセクションでは、一般的な質問への回答を示します。

Grafana で収集されたメトリックを表示できますか?

Container insights では、Grafana ダッシュボードの Log Analytics ワークスペースに格納されているメトリックの表示がサポートされています。 Grafana のダッシュボードリポジトリから、用意されているテンプレートをダウンロードできます。これを使用して作業を開始してください。また、これをリファレンスとして使用して、監視対象クラスターからのデータをクエリして、カスタム Grafana ダッシュボードで視覚化する方法について学習してください。

ログ行が 16 KB を超えると Log Analytics で複数のレコードに分割されるのはなぜですか?

エージェントは、Docker JSON ファイルログドライバーを使用して、コンテナーの stdout と stderr を取り込みます。このログドライバーは、16 KB を超えるログ行を stdout または stderr からファイルにコピーするときに、複数の行に分割します。複数行のログを使用して、最大 64 KB のログレコードサイズを取得します。

次のステップ

コンテナーの分析情報には、定義済みのアラートのセットは含まれません。 DevOps または運用プロセスと手順をサポートするために、CPU とメモリの使用率が高い場合に推奨されるアラートを作成する方法については、「コンテナーの分析情報を使用したパフォーマンスアラートの作成」を参照してください。

次の方法で共有

コンテナー分析情報からのクエリ ログ

Log Analytics を開く

既存のログ クエリ

コンテナーのテーブル

ログ クエリの例

コンテナーのライフ サイクル情報をすべて一覧表示します

Kubernetes イベント

コンテナーの CPU

コンテナメモリ

カスタム メトリックでの 1 分あたりの要求数

名前と名前空間別のポッド

ポッドのスケールアウト (HPA)

ノードプールのスケールアウト

システム コンテナー (レプリカセット) の可用性

システム コンテナー (デーモンセット) の可用性

コンテナーのログ記録

特定のポッド、名前空間、コンテナーに関するコンテナー ログ

特定のデプロイのコンテナー ログ

特定の名前空間で失敗したポッドに関するコンテナー ログ

コンテナーの分析情報の既定の視覚化クエリ

状態別のノード数

状態別のポッド数

状態別のコンテナー一覧

状態別のコントローラー一覧

状態別のノード一覧

Prometheus のメトリック

設定またはスクレイピングエラー

よく寄せられる質問

Grafana で収集されたメトリックを表示できますか?

ログ行が 16 KB を超えると Log Analytics で複数のレコードに分割されるのはなぜですか?

次のステップ

フィードバック

その他のリソース

コンテナー分析情報からのクエリログ

既存のログクエリ

ログクエリの例

コンテナーのライフサイクル情報をすべて一覧表示します

カスタムメトリックでの 1 分あたりの要求数

システムコンテナー (レプリカセット) の可用性

システムコンテナー (デーモンセット) の可用性

特定のポッド、名前空間、コンテナーに関するコンテナーログ

特定のデプロイのコンテナーログ

特定の名前空間で失敗したポッドに関するコンテナーログ