Breyta

Deila með


Azure Machine Learning monitoring data reference

This article contains all the monitoring reference information for this service.

See Monitor Machine Learning for details on the data you can collect for Azure Machine Learning and how to use it.

Metrics

This section lists all the automatically collected platform metrics for this service. These metrics are also part of the global list of all platform metrics supported in Azure Monitor.

For information on metric retention, see Azure Monitor Metrics overview.

The resource provider for these metrics is Microsoft.MachineLearningServices/workspaces.

The metrics categories are Model, Quota, Resource, Run, and Traffic. Quota information is for Machine Learning compute only. Run provides information on training runs for the workspace.

Supported metrics for Microsoft.MachineLearningServices/workspaces

The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces resource type.

  • All columns might not be present in every table.
  • Some columns might be beyond the viewing area of the page. Select Expand table to view all available columns.

Table headings

  • Category - The metrics group or classification.
  • Metric - The metric display name as it appears in the Azure portal.
  • Name in REST API - The metric name as referred to in the REST API.
  • Unit - Unit of measure.
  • Aggregation - The default aggregation type. Valid values: Average (Avg), Minimum (Min), Maximum (Max), Total (Sum), Count.
  • Dimensions - Dimensions available for the metric.
  • Time Grains - Intervals at which the metric is sampled. For example, PT1M indicates that the metric is sampled every minute, PT30M every 30 minutes, PT1H every hour, and so on.
  • DS Export- Whether the metric is exportable to Azure Monitor Logs via diagnostic settings. For information on exporting metrics, see Create diagnostic settings in Azure Monitor.
Category Metric Name in REST API Unit Aggregation Dimensions Time Grains DS Export
Quota Active Cores

Number of active cores
Active Cores Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Quota Active Nodes

Number of Acitve nodes. These are the nodes which are actively running a job.
Active Nodes Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Run Cancel Requested Runs

Number of runs where cancel was requested for this workspace. Count is updated when cancellation request has been received for a run.
Cancel Requested Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Run Cancelled Runs

Number of runs cancelled for this workspace. Count is updated when a run is successfully cancelled.
Cancelled Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Run Completed Runs

Number of runs completed successfully for this workspace. Count is updated when a run has completed and output has been collected.
Completed Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Resource CpuCapacityMillicores

Maximum capacity of a CPU node in millicores. Capacity is aggregated in one minute intervals.
CpuCapacityMillicores Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource CpuMemoryCapacityMegabytes

Maximum memory utilization of a CPU node in megabytes. Utilization is aggregated in one minute intervals.
CpuMemoryCapacityMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource CpuMemoryUtilizationMegabytes

Memory utilization of a CPU node in megabytes. Utilization is aggregated in one minute intervals.
CpuMemoryUtilizationMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource CpuMemoryUtilizationPercentage

Memory utilization percentage of a CPU node. Utilization is aggregated in one minute intervals.
CpuMemoryUtilizationPercentage Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource CpuUtilization

Percentage of utilization on a CPU node. Utilization is reported at one minute intervals.
CpuUtilization Count Average, Maximum, Minimum, Total (Sum) Scenario, runId, NodeId, ClusterName PT1M Yes
Resource CpuUtilizationMillicores

Utilization of a CPU node in millicores. Utilization is aggregated in one minute intervals.
CpuUtilizationMillicores Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource CpuUtilizationPercentage

Utilization percentage of a CPU node. Utilization is aggregated in one minute intervals.
CpuUtilizationPercentage Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource DiskAvailMegabytes

Available disk space in megabytes. Metrics are aggregated in one minute intervals.
DiskAvailMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource DiskReadMegabytes

Data read from disk in megabytes. Metrics are aggregated in one minute intervals.
DiskReadMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource DiskUsedMegabytes

Used disk space in megabytes. Metrics are aggregated in one minute intervals.
DiskUsedMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource DiskWriteMegabytes

Data written into disk in megabytes. Metrics are aggregated in one minute intervals.
DiskWriteMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Run Errors

Number of run errors in this workspace. Count is updated whenever run encounters an error.
Errors Count Total (Sum), Average, Minimum, Maximum, Count Scenario PT1M Yes
Run Failed Runs

Number of runs failed for this workspace. Count is updated when a run fails.
Failed Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Run Finalizing Runs

Number of runs entered finalizing state for this workspace. Count is updated when a run has completed but output collection still in progress.
Finalizing Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Resource GpuCapacityMilliGPUs

Maximum capacity of a GPU device in milli-GPUs. Capacity is aggregated in one minute intervals.
GpuCapacityMilliGPUs Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, DeviceId, ComputeName PT1M Yes
Resource GpuEnergyJoules

Interval energy in Joules on a GPU node. Energy is reported at one minute intervals.
GpuEnergyJoules Count Average, Maximum, Minimum, Total (Sum) Scenario, runId, rootRunId, InstanceId, DeviceId, ComputeName PT1M Yes
Resource GpuMemoryCapacityMegabytes

Maximum memory capacity of a GPU device in megabytes. Capacity aggregated in at one minute intervals.
GpuMemoryCapacityMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, DeviceId, ComputeName PT1M Yes
Resource GpuMemoryUtilization

Percentage of memory utilization on a GPU node. Utilization is reported at one minute intervals.
GpuMemoryUtilization Count Average, Maximum, Minimum, Total (Sum) Scenario, runId, NodeId, DeviceId, ClusterName PT1M Yes
Resource GpuMemoryUtilizationMegabytes

Memory utilization of a GPU device in megabytes. Utilization aggregated in at one minute intervals.
GpuMemoryUtilizationMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, DeviceId, ComputeName PT1M Yes
Resource GpuMemoryUtilizationPercentage

Memory utilization percentage of a GPU device. Utilization aggregated in at one minute intervals.
GpuMemoryUtilizationPercentage Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, DeviceId, ComputeName PT1M Yes
Resource GpuUtilization

Percentage of utilization on a GPU node. Utilization is reported at one minute intervals.
GpuUtilization Count Average, Maximum, Minimum, Total (Sum) Scenario, runId, NodeId, DeviceId, ClusterName PT1M Yes
Resource GpuUtilizationMilliGPUs

Utilization of a GPU device in milli-GPUs. Utilization is aggregated in one minute intervals.
GpuUtilizationMilliGPUs Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, DeviceId, ComputeName PT1M Yes
Resource GpuUtilizationPercentage

Utilization percentage of a GPU device. Utilization is aggregated in one minute intervals.
GpuUtilizationPercentage Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, DeviceId, ComputeName PT1M Yes
Resource IBReceiveMegabytes

Network data received over InfiniBand in megabytes. Metrics are aggregated in one minute intervals.
IBReceiveMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName, DeviceId PT1M Yes
Resource IBTransmitMegabytes

Network data sent over InfiniBand in megabytes. Metrics are aggregated in one minute intervals.
IBTransmitMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName, DeviceId PT1M Yes
Quota Idle Cores

Number of idle cores
Idle Cores Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Quota Idle Nodes

Number of idle nodes. Idle nodes are the nodes which are not running any jobs but can accept new job if available.
Idle Nodes Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Quota Leaving Cores

Number of leaving cores
Leaving Cores Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Quota Leaving Nodes

Number of leaving nodes. Leaving nodes are the nodes which just finished processing a job and will go to Idle state.
Leaving Nodes Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Model Model Deploy Failed

Number of model deployments that failed in this workspace
Model Deploy Failed Count Total (Sum), Average, Minimum, Maximum, Count Scenario, StatusCode PT1M Yes
Model Model Deploy Started

Number of model deployments started in this workspace
Model Deploy Started Count Total (Sum), Average, Minimum, Maximum, Count Scenario PT1M Yes
Model Model Deploy Succeeded

Number of model deployments that succeeded in this workspace
Model Deploy Succeeded Count Total (Sum), Average, Minimum, Maximum, Count Scenario PT1M Yes
Model Model Register Failed

Number of model registrations that failed in this workspace
Model Register Failed Count Total (Sum), Average, Minimum, Maximum, Count Scenario, StatusCode PT1M Yes
Model Model Register Succeeded

Number of model registrations that succeeded in this workspace
Model Register Succeeded Count Total (Sum), Average, Minimum, Maximum, Count Scenario PT1M Yes
Resource NetworkInputMegabytes

Network data received in megabytes. Metrics are aggregated in one minute intervals.
NetworkInputMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName, DeviceId PT1M Yes
Resource NetworkOutputMegabytes

Network data sent in megabytes. Metrics are aggregated in one minute intervals.
NetworkOutputMegabytes Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName, DeviceId PT1M Yes
Run Not Responding Runs

Number of runs not responding for this workspace. Count is updated when a run enters Not Responding state.
Not Responding Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Run Not Started Runs

Number of runs in Not Started state for this workspace. Count is updated when a request is received to create a run but run information has not yet been populated.
Not Started Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Quota Preempted Cores

Number of preempted cores
Preempted Cores Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Quota Preempted Nodes

Number of preempted nodes. These nodes are the low priority nodes which are taken away from the available node pool.
Preempted Nodes Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Run Preparing Runs

Number of runs that are preparing for this workspace. Count is updated when a run enters Preparing state while the run environment is being prepared.
Preparing Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Run Provisioning Runs

Number of runs that are provisioning for this workspace. Count is updated when a run is waiting on compute target creation or provisioning.
Provisioning Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Run Queued Runs

Number of runs that are queued for this workspace. Count is updated when a run is queued in compute target. Can occure when waiting for required compute nodes to be ready.
Queued Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Quota Quota Utilization Percentage

Percent of quota utilized
Quota Utilization Percentage Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName, VmFamilyName, VmPriority PT1M Yes
Run Started Runs

Number of runs running for this workspace. Count is updated when run starts running on required resources.
Started Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Run Starting Runs

Number of runs started for this workspace. Count is updated after request to create run and run info, such as the Run Id, has been populated
Starting Runs Count Total (Sum), Average, Minimum, Maximum, Count Scenario, RunType, PublishedPipelineId, ComputeType, PipelineStepType, ExperimentName PT1M Yes
Resource StorageAPIFailureCount

Azure Blob Storage API calls failure count.
StorageAPIFailureCount Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Resource StorageAPISuccessCount

Azure Blob Storage API calls success count.
StorageAPISuccessCount Count Average, Maximum, Minimum, Total (Sum) RunId, InstanceId, ComputeName PT1M Yes
Quota Total Cores

Number of total cores
Total Cores Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Quota Total Nodes

Number of total nodes. This total includes some of Active Nodes, Idle Nodes, Unusable Nodes, Premepted Nodes, Leaving Nodes
Total Nodes Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Quota Unusable Cores

Number of unusable cores
Unusable Cores Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Quota Unusable Nodes

Number of unusable nodes. Unusable nodes are not functional due to some unresolvable issue. Azure will recycle these nodes.
Unusable Nodes Count Average, Maximum, Minimum, Total (Sum) Scenario, ClusterName PT1M Yes
Run Warnings

Number of run warnings in this workspace. Count is updated whenever a run encounters a warning.
Warnings Count Total (Sum), Average, Minimum, Maximum, Count Scenario PT1M Yes

Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints

The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces/onlineEndpoints resource type.

  • All columns might not be present in every table.
  • Some columns might be beyond the viewing area of the page. Select Expand table to view all available columns.

Table headings

  • Category - The metrics group or classification.
  • Metric - The metric display name as it appears in the Azure portal.
  • Name in REST API - The metric name as referred to in the REST API.
  • Unit - Unit of measure.
  • Aggregation - The default aggregation type. Valid values: Average (Avg), Minimum (Min), Maximum (Max), Total (Sum), Count.
  • Dimensions - Dimensions available for the metric.
  • Time Grains - Intervals at which the metric is sampled. For example, PT1M indicates that the metric is sampled every minute, PT30M every 30 minutes, PT1H every hour, and so on.
  • DS Export- Whether the metric is exportable to Azure Monitor Logs via diagnostic settings. For information on exporting metrics, see Create diagnostic settings in Azure Monitor.
Category Metric Name in REST API Unit Aggregation Dimensions Time Grains DS Export
Traffic Connections Active

The total number of concurrent TCP connections active from clients.
ConnectionsActive Count Average <none> PT1M No
Traffic Data Collection Errors Per Minute

The number of data collection events dropped per minute.
DataCollectionErrorsPerMinute Count Minimum, Maximum, Average deployment, reason, type PT1M No
Traffic Data Collection Events Per Minute

The number of data collection events processed per minute.
DataCollectionEventsPerMinute Count Minimum, Maximum, Average deployment, type PT1M No
Traffic Network Bytes

The bytes per second served for the endpoint.
NetworkBytes BytesPerSecond Average <none> PT1M No
Traffic New Connections Per Second

The average number of new TCP connections per second established from clients.
NewConnectionsPerSecond CountPerSecond Average <none> PT1M No
Traffic Request Latency

The average complete interval of time taken for a request to be responded in milliseconds
RequestLatency Milliseconds Average deployment PT1M Yes
Traffic Request Latency P50

The average P50 request latency aggregated by all request latency values collected over the selected time period
RequestLatency_P50 Milliseconds Average deployment PT1M Yes
Traffic Request Latency P90

The average P90 request latency aggregated by all request latency values collected over the selected time period
RequestLatency_P90 Milliseconds Average deployment PT1M Yes
Traffic Request Latency P95

The average P95 request latency aggregated by all request latency values collected over the selected time period
RequestLatency_P95 Milliseconds Average deployment PT1M Yes
Traffic Request Latency P99

The average P99 request latency aggregated by all request latency values collected over the selected time period
RequestLatency_P99 Milliseconds Average deployment PT1M Yes
Traffic Requests Per Minute

The number of requests sent to online endpoint within a minute
RequestsPerMinute Count Average deployment, statusCode, statusCodeClass, modelStatusCode PT1M No

Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments

The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource type.

  • All columns might not be present in every table.
  • Some columns might be beyond the viewing area of the page. Select Expand table to view all available columns.

Table headings

  • Category - The metrics group or classification.
  • Metric - The metric display name as it appears in the Azure portal.
  • Name in REST API - The metric name as referred to in the REST API.
  • Unit - Unit of measure.
  • Aggregation - The default aggregation type. Valid values: Average (Avg), Minimum (Min), Maximum (Max), Total (Sum), Count.
  • Dimensions - Dimensions available for the metric.
  • Time Grains - Intervals at which the metric is sampled. For example, PT1M indicates that the metric is sampled every minute, PT30M every 30 minutes, PT1H every hour, and so on.
  • DS Export- Whether the metric is exportable to Azure Monitor Logs via diagnostic settings. For information on exporting metrics, see Create diagnostic settings in Azure Monitor.
Category Metric Name in REST API Unit Aggregation Dimensions Time Grains DS Export
Resource CPU Memory Utilization Percentage

Percentage of memory utilization on an instance. Utilization is reported at one minute intervals.
CpuMemoryUtilizationPercentage Percent Minimum, Maximum, Average instanceId PT1M Yes
Resource CPU Utilization Percentage

Percentage of CPU utilization on an instance. Utilization is reported at one minute intervals.
CpuUtilizationPercentage Percent Minimum, Maximum, Average instanceId PT1M Yes
Resource Data Collection Errors Per Minute

The number of data collection events dropped per minute.
DataCollectionErrorsPerMinute Count Minimum, Maximum, Average instanceId, reason, type PT1M No
Resource Data Collection Events Per Minute

The number of data collection events processed per minute.
DataCollectionEventsPerMinute Count Minimum, Maximum, Average instanceId, type PT1M No
Resource Deployment Capacity

The number of instances in the deployment.
DeploymentCapacity Count Minimum, Maximum, Average instanceId, State PT1M No
Resource Disk Utilization

Percentage of disk utilization on an instance. Utilization is reported at one minute intervals.
DiskUtilization Percent Minimum, Maximum, Average instanceId, disk PT1M Yes
Resource GPU Energy in Joules

Interval energy in Joules on a GPU node. Energy is reported at one minute intervals.
GpuEnergyJoules Count Minimum, Maximum, Average instanceId PT1M No
Resource GPU Memory Utilization Percentage

Percentage of GPU memory utilization on an instance. Utilization is reported at one minute intervals.
GpuMemoryUtilizationPercentage Percent Minimum, Maximum, Average instanceId PT1M Yes
Resource GPU Utilization Percentage

Percentage of GPU utilization on an instance. Utilization is reported at one minute intervals.
GpuUtilizationPercentage Percent Minimum, Maximum, Average instanceId PT1M Yes
Traffic Request Latency P50

The average P50 request latency aggregated by all request latency values collected over the selected time period
RequestLatency_P50 Milliseconds Average <none> PT1M Yes
Traffic Request Latency P90

The average P90 request latency aggregated by all request latency values collected over the selected time period
RequestLatency_P90 Milliseconds Average <none> PT1M Yes
Traffic Request Latency P95

The average P95 request latency aggregated by all request latency values collected over the selected time period
RequestLatency_P95 Milliseconds Average <none> PT1M Yes
Traffic Request Latency P99

The average P99 request latency aggregated by all request latency values collected over the selected time period
RequestLatency_P99 Milliseconds Average <none> PT1M Yes
Traffic Requests Per Minute

The number of requests sent to online deployment within a minute
RequestsPerMinute Count Average envoy_response_code PT1M No

Metric dimensions

For information about what metric dimensions are, see Multi-dimensional metrics.

This service has the following dimensions associated with its metrics.

Dimension Description
Cluster Name The name of the compute cluster resource. Available for all quota metrics.
Vm Family Name The name of the VM family used by the cluster. Available for quota utilization percentage.
Vm Priority The priority of the VM. Available for quota utilization percentage.
CreatedTime Only available for CpuUtilization and GpuUtilization.
DeviceId ID of the device (GPU). Only available for GpuUtilization.
NodeId ID of the node created where job is running. Only available for CpuUtilization and GpuUtilization.
RunId ID of the run/job. Only available for CpuUtilization and GpuUtilization.
ComputeType The compute type that the run used. Only available for Completed runs, Failed runs, and Started runs.
PipelineStepType The type of PipelineStep used in the run. Only available for Completed runs, Failed runs, and Started runs.
PublishedPipelineId The ID of the published pipeline used in the run. Only available for Completed runs, Failed runs, and Started runs.
RunType The type of run. Only available for Completed runs, Failed runs, and Started runs.

The valid values for the RunType dimension are:

Value Description
Experiment Non-pipeline runs.
PipelineRun A pipeline run, which is the parent of a StepRun.
StepRun A run for a pipeline step.
ReusedStepRun A run for a pipeline step that reuses a previous run.

Resource logs

This section lists the types of resource logs you can collect for this service. The section pulls from the list of all resource logs category types supported in Azure Monitor.

Supported resource logs for Microsoft.MachineLearningServices/registries

Category Category display name Log table Supports basic log plan Supports ingestion-time transformation Example queries Costs to export
RegistryAssetReadEvent Registry Asset Read Event No No Yes
RegistryAssetWriteEvent Registry Asset Write Event AmlRegistryWriteEventsLog

Azure ML Registry Write events log. It keeps records of Write operations with registries data access (data plane), including user identity, asset name and version for each access event.

No No Queries Yes

Supported resource logs for Microsoft.MachineLearningServices/workspaces

Category Category display name Log table Supports basic log plan Supports ingestion-time transformation Example queries Costs to export
AmlComputeClusterEvent AmlComputeClusterEvent AmlComputeClusterEvent

AmlCompute Cluster events

No Yes Queries No
AmlComputeClusterNodeEvent AmlComputeClusterNodeEvent No No Yes
AmlComputeCpuGpuUtilization AmlComputeCpuGpuUtilization AmlComputeCpuGpuUtilization

Azure Machine Learning services CPU and GPU utilizaion logs.

No Yes Queries No
AmlComputeJobEvent AmlComputeJobEvent AmlComputeJobEvent

AmlCompute Job events

No Yes Queries No
AmlRunStatusChangedEvent AmlRunStatusChangedEvent AmlRunStatusChangedEvent

Azure Machine Learning services run status event logs.

No Yes No
ComputeInstanceEvent ComputeInstanceEvent AmlComputeInstanceEvent

Events when ML Compute Instance is accessed (read/write).

No Yes Yes
DataLabelChangeEvent DataLabelChangeEvent AmlDataLabelEvent

Events when data label(s) or its projects is accessed (read, created, or deleted).

No Yes Yes
DataLabelReadEvent DataLabelReadEvent AmlDataLabelEvent

Events when data label(s) or its projects is accessed (read, created, or deleted).

No Yes Yes
DataSetChangeEvent DataSetChangeEvent AmlDataSetEvent

Events when a registered or unregistered ML datastore is accessed (read, created, or deleted).

No Yes Queries Yes
DataSetReadEvent DataSetReadEvent AmlDataSetEvent

Events when a registered or unregistered ML datastore is accessed (read, created, or deleted).

No Yes Queries Yes
DataStoreChangeEvent DataStoreChangeEvent AmlDataStoreEvent

Events when ML datastore is accessed (read, created, or deleted).

No Yes Yes
DataStoreReadEvent DataStoreReadEvent AmlDataStoreEvent

Events when ML datastore is accessed (read, created, or deleted).

No Yes Yes
DeploymentEventACI DeploymentEventACI AmlDeploymentEvent

Events when a model deployment happens on ACI or AKS.

No Yes Yes
DeploymentEventAKS DeploymentEventAKS AmlDeploymentEvent

Events when a model deployment happens on ACI or AKS.

No Yes Yes
DeploymentReadEvent DeploymentReadEvent AmlDeploymentEvent

Events when a model deployment happens on ACI or AKS.

No Yes Yes
EnvironmentChangeEvent EnvironmentChangeEvent AmlEnvironmentEvent

Events when ML environments are accessed (read, created, or deleted).

No Yes Queries Yes
EnvironmentReadEvent EnvironmentReadEvent AmlEnvironmentEvent

Events when ML environments are accessed (read, created, or deleted).

No Yes Queries Yes
InferencingOperationACI InferencingOperationACI AmlInferencingEvent

Events for inference or related operation on AKS or ACI compute type.

No Yes Yes
InferencingOperationAKS InferencingOperationAKS AmlInferencingEvent

Events for inference or related operation on AKS or ACI compute type.

No Yes Yes
ModelsActionEvent ModelsActionEvent AmlModelsEvent

Events when ML model is accessed (read, created, or deleted). Incudes events when packaging of models and assets happen into a ready-to-build packages.

No Yes Queries Yes
ModelsChangeEvent ModelsChangeEvent AmlModelsEvent

Events when ML model is accessed (read, created, or deleted). Incudes events when packaging of models and assets happen into a ready-to-build packages.

No Yes Queries Yes
ModelsReadEvent ModelsReadEvent AmlModelsEvent

Events when ML model is accessed (read, created, or deleted). Incudes events when packaging of models and assets happen into a ready-to-build packages.

No Yes Queries Yes
PipelineChangeEvent PipelineChangeEvent AmlPipelineEvent

Events when ML pipeline draft or endpoint or module are accessed (read, created, or deleted).

No Yes Yes
PipelineReadEvent PipelineReadEvent AmlPipelineEvent

Events when ML pipeline draft or endpoint or module are accessed (read, created, or deleted).

No Yes Yes
RunEvent RunEvent AmlRunEvent

Events when ML experiments are accessed (read, created, or deleted).

No Yes Yes
RunReadEvent RunReadEvent AmlRunEvent

Events when ML experiments are accessed (read, created, or deleted).

No Yes Yes

Supported resource logs for Microsoft.MachineLearningServices/workspaces/onlineEndpoints

Category Category display name Log table Supports basic log plan Supports ingestion-time transformation Example queries Costs to export
AmlOnlineEndpointConsoleLog AmlOnlineEndpointConsoleLog AmlOnlineEndpointConsoleLog

Azure ML online endpoints console logs. It provides console logs output from user containers.

No Yes Queries Yes
AmlOnlineEndpointEventLog AmlOnlineEndpointEventLog AmlOnlineEndpointEventLog

Azure ML online endpoints event logs. It provides event logs regarding the inference-server container's life cycle.

No No Queries Yes
AmlOnlineEndpointTrafficLog AmlOnlineEndpointTrafficLog AmlOnlineEndpointTrafficLog

Traffic logs for AzureML (machine learning) online endpoints. The table could be used to check the detailed information of the request to an online endpoint. For example, you could use it to check the request duration, the request failure reason, etc.

No No Queries Yes

Azure Monitor Logs tables

This section lists the Azure Monitor Logs tables relevant to this service, which are available for query by Log Analytics using Kusto queries. The tables contain resource log data and possibly more depending on what is collected and routed to them.

Machine Learning

Microsoft.MachineLearningServices/workspaces

Microsoft.MachineLearningServices/registries

Activity log

The linked table lists the operations that can be recorded in the activity log for this service. These operations are a subset of all the possible resource provider operations in the activity log.

For more information on the schema of activity log entries, see Activity Log schema.

The following table lists some operations related to Machine Learning that may be created in the activity log. For a complete listing of Microsoft.MachineLearningServices operations, see Microsoft.MachineLearningServices resource provider operations.

Operation Description
Creates or updates a Machine Learning workspace A workspace was created or updated
CheckComputeNameAvailability Check if a compute name is already in use
Creates or updates the compute resources A compute resource was created or updated
Deletes the compute resources A compute resource was deleted
List secrets On operation listed secrets for a Machine Learning workspace

Log schemas

Azure Machine Learning uses the following schemas.

AmlComputeJobEvent table

Property Description
TimeGenerated Time when the log entry was generated
OperationName Name of the operation associated with the log event
Category Name of the log event
JobId ID of the Job submitted
ExperimentId ID of the Experiment
ExperimentName Name of the Experiment
CustomerSubscriptionId SubscriptionId where Experiment and Job as submitted
WorkspaceName Name of the machine learning workspace
ClusterName Name of the Cluster
ProvisioningState State of the Job submission
ResourceGroupName Name of the resource group
JobName Name of the Job
ClusterId ID of the cluster
EventType Type of the Job event. For example, JobSubmitted, JobRunning, JobFailed, JobSucceeded.
ExecutionState State of the job (the Run). For example, Queued, Running, Succeeded, Failed
ErrorDetails Details of job error
CreationApiVersion Api version used to create the job
ClusterResourceGroupName Resource group name of the cluster
TFWorkerCount Count of TF workers
TFParameterServerCount Count of TF parameter server
ToolType Type of tool used
RunInContainer Flag describing if job should be run inside a container
JobErrorMessage detailed message of Job error
NodeId ID of the node created where job is running

AmlComputeClusterEvent table

Property Description
TimeGenerated Time when the log entry was generated
OperationName Name of the operation associated with the log event
Category Name of the log event
ProvisioningState Provisioning state of the cluster
ClusterName Name of the cluster
ClusterType Type of the cluster
CreatedBy User who created the cluster
CoreCount Count of the cores in the cluster
VmSize Vm size of the cluster
VmPriority Priority of the nodes created inside a cluster Dedicated/LowPriority
ScalingType Type of cluster scaling manual/auto
InitialNodeCount Initial node count of the cluster
MinimumNodeCount Minimum node count of the cluster
MaximumNodeCount Maximum node count of the cluster
NodeDeallocationOption How the node should be deallocated
Publisher Publisher of the cluster type
Offer Offer with which the cluster is created
Sku Sku of the Node/VM created inside cluster
Version Version of the image used while Node/VM is created
SubnetId SubnetId of the cluster
AllocationState Cluster allocation state
CurrentNodeCount Current node count of the cluster
TargetNodeCount Target node count of the cluster while scaling up/down
EventType Type of event during cluster creation.
NodeIdleTimeSecondsBeforeScaleDown Idle time in seconds before cluster is scaled down
PreemptedNodeCount Preempted node count of the cluster
IsResizeGrow Flag indicating that cluster is scaling up
VmFamilyName Name of the VM family of the nodes that can be created inside cluster
LeavingNodeCount Leaving node count of the cluster
UnusableNodeCount Unusable node count of the cluster
IdleNodeCount Idle node count of the cluster
RunningNodeCount Running node count of the cluster
PreparingNodeCount Preparing node count of the cluster
QuotaAllocated Allocated quota to the cluster
QuotaUtilized Utilized quota of the cluster
AllocationStateTransitionTime Transition time from one state to another
ClusterErrorCodes Error code received during cluster creation or scaling
CreationApiVersion Api version used while creating the cluster

AmlComputeInstanceEvent table

Property Description
Type Name of the log event, AmlComputeInstanceEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
CorrelationId A GUID used to group together a set of related events, when applicable.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
AmlComputeInstanceName "The name of the compute instance associated with the log entry.

AmlDataLabelEvent table

Property Description
Type Name of the log event, AmlDataLabelEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
CorrelationId A GUID used to group together a set of related events, when applicable.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
AmlProjectId The unique identifier of the Azure Machine Learning project.
AmlProjectName The name of the Azure Machine Learning project.
AmlLabelNames The label class names which are created for the project.
AmlDataStoreName The name of the data store where the project's data is stored.

AmlDataSetEvent table

Property Description
Type Name of the log event, AmlDataSetEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
AmlWorkspaceId A GUID and unique ID of the Azure Machine Learning workspace.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
AmlDatasetId The ID of the Azure Machine Learning Data Set.
AmlDatasetName The name of the Azure Machine Learning Data Set.

AmlDataStoreEvent table

Property Description
Type Name of the log event, AmlDataStoreEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
AmlWorkspaceId A GUID and unique ID of the Azure Machine Learning workspace.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
AmlDatastoreName The name of the Azure Machine Learning Data Store.

AmlDeploymentEvent table

Property Description
Type Name of the log event, AmlDeploymentEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
AmlServiceName The name of the Azure Machine Learning Service.

AmlInferencingEvent table

Property Description
Type Name of the log event, AmlInferencingEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
AmlServiceName The name of the Azure Machine Learning Service.

AmlModelsEvent table

Property Description
Type Name of the log event, AmlModelsEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
ResultSignature The HTTP status code of the event. Typical values include 200, 201, 202 etc.
AmlModelName The name of the Azure Machine Learning Model.

AmlPipelineEvent table

Property Description
Type Name of the log event, AmlPipelineEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
AmlWorkspaceId A GUID and unique ID of the Azure Machine Learning workspace.
AmlWorkspaceId The name of the Azure Machine Learning workspace.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
AmlModuleId A GUID and unique ID of the module.
AmlModelName The name of the Azure Machine Learning Model.
AmlPipelineId The ID of the Azure Machine Learning pipeline.
AmlParentPipelineId The ID of the parent Azure Machine Learning pipeline (in the case of cloning).
AmlPipelineDraftId The ID of the Azure Machine Learning pipeline draft.
AmlPipelineDraftName The name of the Azure Machine Learning pipeline draft.
AmlPipelineEndpointId The ID of the Azure Machine Learning pipeline endpoint.
AmlPipelineEndpointName The name of the Azure Machine Learning pipeline endpoint.

AmlRunEvent table

Property Description
Type Name of the log event, AmlRunEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
OperationName The name of the operation associated with the log entry
AmlWorkspaceId A GUID and unique ID of the Azure Machine Learning workspace.
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
RunId The unique ID of the run.

AmlEnvironmentEvent table

Property Description
Type Name of the log event, AmlEnvironmentEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
AmlEnvironmentName The name of the Azure Machine Learning environment configuration.
AmlEnvironmentVersion The name of the Azure Machine Learning environment configuration version.

AMLOnlineEndpointTrafficLog table (preview)

Property Description
Method The requested method from client.
Path The requested path from client.
SubscriptionId The machine learning subscription ID of the online endpoint.
AzureMLWorkspaceId The machine learning workspace ID of the online endpoint.
AzureMLWorkspaceName The machine learning workspace name of the online endpoint.
EndpointName The name of the online endpoint.
DeploymentName The name of the online deployment.
Protocol The protocol of the request.
ResponseCode The final response code returned to the client.
ResponseCodeReason The final response code reason returned to the client.
ModelStatusCode The response status code from model.
ModelStatusReason The response status reason from model.
RequestPayloadSize The total bytes received from the client.
ResponsePayloadSize The total bytes sent back to the client.
UserAgent The user-agent header of the request, including comments but truncated to a max of 70 characters.
XRequestId The request ID generated by Azure Machine Learning for internal tracing.
XMSClientRequestId The tracking ID generated by the client.
TotalDurationMs Duration in milliseconds from the request start time to the last response byte sent back to the client. If the client disconnected, it measures from the start time to client disconnect time.
RequestDurationMs Duration in milliseconds from the request start time to the last byte of the request received from the client.
ResponseDurationMs Duration in milliseconds from the request start time to the first response byte read from the model.
RequestThrottlingDelayMs Delay in milliseconds in request data transfer due to network throttling.
ResponseThrottlingDelayMs Delay in milliseconds in response data transfer due to network throttling.

For more information on this log, see Monitor online endpoints.

AMLOnlineEndpointConsoleLog

Property Description
TimeGenerated The timestamp (UTC) of when the log was generated.
OperationName The operation associated with log record.
InstanceId The ID of the instance that generated this log record.
DeploymentName The name of the deployment associated with the log record.
ContainerName The name of the container where the log was generated.
Message The content of the log.

For more information on this log, see Monitor online endpoints.

AMLOnlineEndpointEventLog (preview)

Property Description
TimeGenerated The timestamp (UTC) of when the log was generated.
OperationName The operation associated with log record.
InstanceId The ID of the instance that generated this log record.
DeploymentName The name of the deployment associated with the log record.
Name The name of the event.
Message The content of the event.

For more information on this log, see Monitor online endpoints.