Supported metrics for Microsoft.MachineLearningServices/workspaces
The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces resource type.
Table headings
Metric - The metric display name as it appears in the Azure portal.
Name in Rest API - Metric name as referred to in the REST API.
Unit - Unit of measure.
Aggregation - The default aggregation type. Valid values: Average, Minimum, Maximum, Total, Count.
Dimensions - Dimensions available for the metric.
Time Grains - Intervals at which the metric is sampled. For example, PT1M
indicates that the metric is sampled every minute, PT30M
every 30 minutes, PT1H
every hour, and so on.
DS Export- Whether the metric is exportable to Azure Monitor Logs via Diagnostic Settings.
For information on exporting metrics, see Create diagnostic settings in Azure Monitor.
For information on metric retention, see Azure Monitor Metrics overview.
For a list of supported logs, see Supported log categories - Microsoft.MachineLearningServices/workspaces
Category | Metric | Name in REST API | Unit | Aggregation | Dimensions | Time Grains | DS Export |
---|---|---|---|---|---|---|---|
Quota | Active Cores Number of active cores |
Active Cores |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Quota | Active Nodes Number of Acitve nodes. These are the nodes which are actively running a job. |
Active Nodes |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Run | Cancel Requested Runs Number of runs where cancel was requested for this workspace. Count is updated when cancellation request has been received for a run. |
Cancel Requested Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Run | Cancelled Runs Number of runs cancelled for this workspace. Count is updated when a run is successfully cancelled. |
Cancelled Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Run | Completed Runs Number of runs completed successfully for this workspace. Count is updated when a run has completed and output has been collected. |
Completed Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Resource | CpuCapacityMillicores Maximum capacity of a CPU node in millicores. Capacity is aggregated in one minute intervals. |
CpuCapacityMillicores |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | CpuMemoryCapacityMegabytes Maximum memory utilization of a CPU node in megabytes. Utilization is aggregated in one minute intervals. |
CpuMemoryCapacityMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | CpuMemoryUtilizationMegabytes Memory utilization of a CPU node in megabytes. Utilization is aggregated in one minute intervals. |
CpuMemoryUtilizationMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | CpuMemoryUtilizationPercentage Memory utilization percentage of a CPU node. Utilization is aggregated in one minute intervals. |
CpuMemoryUtilizationPercentage |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | CpuUtilization Percentage of utilization on a CPU node. Utilization is reported at one minute intervals. |
CpuUtilization |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , runId , NodeId , ClusterName |
PT1M | Yes |
Resource | CpuUtilizationMillicores Utilization of a CPU node in millicores. Utilization is aggregated in one minute intervals. |
CpuUtilizationMillicores |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | CpuUtilizationPercentage Utilization percentage of a CPU node. Utilization is aggregated in one minute intervals. |
CpuUtilizationPercentage |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | DiskAvailMegabytes Available disk space in megabytes. Metrics are aggregated in one minute intervals. |
DiskAvailMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | DiskReadMegabytes Data read from disk in megabytes. Metrics are aggregated in one minute intervals. |
DiskReadMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | DiskUsedMegabytes Used disk space in megabytes. Metrics are aggregated in one minute intervals. |
DiskUsedMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | DiskWriteMegabytes Data written into disk in megabytes. Metrics are aggregated in one minute intervals. |
DiskWriteMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Run | Errors Number of run errors in this workspace. Count is updated whenever run encounters an error. |
Errors |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario |
PT1M | Yes |
Run | Failed Runs Number of runs failed for this workspace. Count is updated when a run fails. |
Failed Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Run | Finalizing Runs Number of runs entered finalizing state for this workspace. Count is updated when a run has completed but output collection still in progress. |
Finalizing Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Resource | GpuCapacityMilliGPUs Maximum capacity of a GPU device in milli-GPUs. Capacity is aggregated in one minute intervals. |
GpuCapacityMilliGPUs |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , DeviceId , ComputeName |
PT1M | Yes |
Resource | GpuEnergyJoules Interval energy in Joules on a GPU node. Energy is reported at one minute intervals. |
GpuEnergyJoules |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , runId , rootRunId , InstanceId , DeviceId , ComputeName |
PT1M | Yes |
Resource | GpuMemoryCapacityMegabytes Maximum memory capacity of a GPU device in megabytes. Capacity aggregated in at one minute intervals. |
GpuMemoryCapacityMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , DeviceId , ComputeName |
PT1M | Yes |
Resource | GpuMemoryUtilization Percentage of memory utilization on a GPU node. Utilization is reported at one minute intervals. |
GpuMemoryUtilization |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , runId , NodeId , DeviceId , ClusterName |
PT1M | Yes |
Resource | GpuMemoryUtilizationMegabytes Memory utilization of a GPU device in megabytes. Utilization aggregated in at one minute intervals. |
GpuMemoryUtilizationMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , DeviceId , ComputeName |
PT1M | Yes |
Resource | GpuMemoryUtilizationPercentage Memory utilization percentage of a GPU device. Utilization aggregated in at one minute intervals. |
GpuMemoryUtilizationPercentage |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , DeviceId , ComputeName |
PT1M | Yes |
Resource | GpuUtilization Percentage of utilization on a GPU node. Utilization is reported at one minute intervals. |
GpuUtilization |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , runId , NodeId , DeviceId , ClusterName |
PT1M | Yes |
Resource | GpuUtilizationMilliGPUs Utilization of a GPU device in milli-GPUs. Utilization is aggregated in one minute intervals. |
GpuUtilizationMilliGPUs |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , DeviceId , ComputeName |
PT1M | Yes |
Resource | GpuUtilizationPercentage Utilization percentage of a GPU device. Utilization is aggregated in one minute intervals. |
GpuUtilizationPercentage |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , DeviceId , ComputeName |
PT1M | Yes |
Resource | IBReceiveMegabytes Network data received over InfiniBand in megabytes. Metrics are aggregated in one minute intervals. |
IBReceiveMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName , DeviceId |
PT1M | Yes |
Resource | IBTransmitMegabytes Network data sent over InfiniBand in megabytes. Metrics are aggregated in one minute intervals. |
IBTransmitMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName , DeviceId |
PT1M | Yes |
Quota | Idle Cores Number of idle cores |
Idle Cores |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Quota | Idle Nodes Number of idle nodes. Idle nodes are the nodes which are not running any jobs but can accept new job if available. |
Idle Nodes |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Quota | Leaving Cores Number of leaving cores |
Leaving Cores |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Quota | Leaving Nodes Number of leaving nodes. Leaving nodes are the nodes which just finished processing a job and will go to Idle state. |
Leaving Nodes |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Model | Model Deploy Failed Number of model deployments that failed in this workspace |
Model Deploy Failed |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , StatusCode |
PT1M | Yes |
Model | Model Deploy Started Number of model deployments started in this workspace |
Model Deploy Started |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario |
PT1M | Yes |
Model | Model Deploy Succeeded Number of model deployments that succeeded in this workspace |
Model Deploy Succeeded |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario |
PT1M | Yes |
Model | Model Register Failed Number of model registrations that failed in this workspace |
Model Register Failed |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , StatusCode |
PT1M | Yes |
Model | Model Register Succeeded Number of model registrations that succeeded in this workspace |
Model Register Succeeded |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario |
PT1M | Yes |
Resource | NetworkInputMegabytes Network data received in megabytes. Metrics are aggregated in one minute intervals. |
NetworkInputMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName , DeviceId |
PT1M | Yes |
Resource | NetworkOutputMegabytes Network data sent in megabytes. Metrics are aggregated in one minute intervals. |
NetworkOutputMegabytes |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName , DeviceId |
PT1M | Yes |
Run | Not Responding Runs Number of runs not responding for this workspace. Count is updated when a run enters Not Responding state. |
Not Responding Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Run | Not Started Runs Number of runs in Not Started state for this workspace. Count is updated when a request is received to create a run but run information has not yet been populated. |
Not Started Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Quota | Preempted Cores Number of preempted cores |
Preempted Cores |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Quota | Preempted Nodes Number of preempted nodes. These nodes are the low priority nodes which are taken away from the available node pool. |
Preempted Nodes |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Run | Preparing Runs Number of runs that are preparing for this workspace. Count is updated when a run enters Preparing state while the run environment is being prepared. |
Preparing Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Run | Provisioning Runs Number of runs that are provisioning for this workspace. Count is updated when a run is waiting on compute target creation or provisioning. |
Provisioning Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Run | Queued Runs Number of runs that are queued for this workspace. Count is updated when a run is queued in compute target. Can occure when waiting for required compute nodes to be ready. |
Queued Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Quota | Quota Utilization Percentage Percent of quota utilized |
Quota Utilization Percentage |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName , VmFamilyName , VmPriority |
PT1M | Yes |
Run | Started Runs Number of runs running for this workspace. Count is updated when run starts running on required resources. |
Started Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Run | Starting Runs Number of runs started for this workspace. Count is updated after request to create run and run info, such as the Run Id, has been populated |
Starting Runs |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario , RunType , PublishedPipelineId , ComputeType , PipelineStepType , ExperimentName |
PT1M | Yes |
Resource | StorageAPIFailureCount Azure Blob Storage API calls failure count. |
StorageAPIFailureCount |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Resource | StorageAPISuccessCount Azure Blob Storage API calls success count. |
StorageAPISuccessCount |
Count | Average, Maximum, Minimum, Total (Sum) | RunId , InstanceId , ComputeName |
PT1M | Yes |
Quota | Total Cores Number of total cores |
Total Cores |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Quota | Total Nodes Number of total nodes. This total includes some of Active Nodes, Idle Nodes, Unusable Nodes, Premepted Nodes, Leaving Nodes |
Total Nodes |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Quota | Unusable Cores Number of unusable cores |
Unusable Cores |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Quota | Unusable Nodes Number of unusable nodes. Unusable nodes are not functional due to some unresolvable issue. Azure will recycle these nodes. |
Unusable Nodes |
Count | Average, Maximum, Minimum, Total (Sum) | Scenario , ClusterName |
PT1M | Yes |
Run | Warnings Number of run warnings in this workspace. Count is updated whenever a run encounters a warning. |
Warnings |
Count | Total (Sum), Average, Minimum, Maximum, Count | Scenario |
PT1M | Yes |