Monitor performance of models deployed to production

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

Learn to use Azure Machine Learning's model monitoring to continuously track the performance of machine learning models in production. Model monitoring provides you with a broad view of monitoring signals and alerts you to potential issues. When you monitor signals and performance metrics of models in production, you can critically evaluate the inherent risks associated with them and identify blind spots that could adversely affect your business.

In this article you, learn to perform the following tasks:

  • Set up out-of box and advanced monitoring for models that are deployed to Azure Machine Learning online endpoints
  • Monitor performance metrics for models in production
  • Monitor models that are deployed outside Azure Machine Learning or deployed to Azure Machine Learning batch endpoints
  • Set up model monitoring with custom signals and metrics
  • Interpret monitoring results
  • Integrate Azure Machine Learning model monitoring with Azure Event Grid

Prerequisites

Before following the steps in this article, make sure you have the following prerequisites:

  • Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the owner or contributor role for the Azure Machine Learning workspace, or a custom role allowing Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*. For more information, see Manage access to an Azure Machine Learning workspace.

  • For monitoring a model that is deployed to an Azure Machine Learning online endpoint (managed online endpoint or Kubernetes online endpoint), be sure to:

  • For monitoring a model that is deployed to an Azure Machine Learning batch endpoint or deployed outside of Azure Machine Learning, be sure to:

    • Have a means to collect production data and register it as an Azure Machine Learning data asset.
    • Update the registered data asset continuously for model monitoring.
    • (Recommended) Register the model in an Azure Machine Learning workspace, for lineage tracking.

Important

Model monitoring jobs are scheduled to run on serverless Spark compute pools with support for the following VM instance types: Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, and Standard_E64s_v3. You can select the VM instance type with the create_monitor.compute.instance_type property in your YAML configuration or from the dropdown in the Azure Machine Learning studio.

Set up out-of-box model monitoring

Suppose you deploy your model to production in an Azure Machine Learning online endpoint and enable data collection at deployment time. In this scenario, Azure Machine Learning collects production inference data, and automatically stores it in Microsoft Azure Blob Storage. You can then use Azure Machine Learning model monitoring to continuously monitor this production inference data.

You can use the Azure CLI, the Python SDK, or the studio for an out-of-box setup of model monitoring. The out-of-box model monitoring configuration provides the following monitoring capabilities:

  • Azure Machine Learning automatically detects the production inference dataset associated with an Azure Machine Learning online deployment and uses the dataset for model monitoring.
  • The comparison reference dataset is set as the recent, past production inference dataset.
  • Monitoring setup automatically includes and tracks the built-in monitoring signals: data drift, prediction drift, and data quality. For each monitoring signal, Azure Machine Learning uses:
    • the recent, past production inference dataset as the comparison reference dataset.
    • smart defaults for metrics and thresholds.
  • A monitoring job is scheduled to run daily at 3:15am (for this example) to acquire monitoring signals and evaluate each metric result against its corresponding threshold. By default, when any threshold is exceeded, Azure Machine Learning sends an alert email to the user that set up the monitor.

Azure Machine Learning model monitoring uses az ml schedule to schedule a monitoring job. You can create the out-of-box model monitor with the following CLI command and YAML definition:

az ml schedule create -f ./out-of-box-monitoring.yaml

The following YAML contains the definition for the out-of-box model monitoring.

# out-of-box-monitoring.yaml
$schema:  http://azureml/sdk-2-0/Schedule.json
name: credit_default_model_monitoring
display_name: Credit default model monitoring
description: Credit default model monitoring setup with minimal configurations

trigger:
  # perform model monitoring activity daily at 3:15am
  type: recurrence
  frequency: day #can be minute, hour, day, week, month
  interval: 1 # #every day
  schedule: 
    hours: 3 # at 3am
    minutes: 15 # at 15 mins after 3am

create_monitor:

  compute: # specify a spark compute for monitoring job
    instance_type: standard_e4s_v3
    runtime_version: "3.3"

  monitoring_target: 
    ml_task: classification # model task type: [classification, regression, question_answering]
    endpoint_deployment_id: azureml:credit-default:main # azureml endpoint deployment id

  alert_notification: # emails to get alerts
    emails:
      - abc@example.com
      - def@example.com

Set up advanced model monitoring

Azure Machine Learning provides many capabilities for continuous model monitoring. See Capabilities of model monitoring for a comprehensive list of these capabilities. In many cases, you need to set up model monitoring with advanced monitoring capabilities. In the following sections, you set up model monitoring with these capabilities:

  • Use of multiple monitoring signals for a broad view.
  • Use of historical model training data or validation data as the comparison reference dataset.
  • Monitoring of top N most important features and individual features.

Configure feature importance

Feature importance represents the relative importance of each input feature to a model's output. For example, temperature might be more important to a model's prediction compared to elevation. Enabling feature importance can give you visibility into which features you don't want drifting or having data quality issues in production.

To enable feature importance with any of your signals (such as data drift or data quality), you need to provide:

  • Your training dataset as the reference_data dataset.
  • The reference_data.data_column_names.target_column property, which is the name of your model's output/prediction column.

After enabling feature importance, you'll see a feature importance for each feature you're monitoring in the Azure Machine Learning model monitoring studio UI.

You can use Azure CLI, the Python SDK, or the studio for advanced setup of model monitoring.

Create advanced model monitoring setup with the following CLI command and YAML definition:

az ml schedule create -f ./advanced-model-monitoring.yaml

The following YAML contains the definition for advanced model monitoring.

# advanced-model-monitoring.yaml
$schema:  http://azureml/sdk-2-0/Schedule.json
name: fraud_detection_model_monitoring
display_name: Fraud detection model monitoring
description: Fraud detection model monitoring with advanced configurations

trigger:
  # perform model monitoring activity daily at 3:15am
  type: recurrence
  frequency: day #can be minute, hour, day, week, month
  interval: 1 # #every day
  schedule: 
    hours: 3 # at 3am
    minutes: 15 # at 15 mins after 3am

create_monitor:

  compute: 
    instance_type: standard_e4s_v3
    runtime_version: "3.3"

  monitoring_target:
    ml_task: classification
    endpoint_deployment_id: azureml:credit-default:main
  
  monitoring_signals:
    advanced_data_drift: # monitoring signal name, any user defined name works
      type: data_drift
      # reference_dataset is optional. By default referece_dataset is the production inference data associated with Azure Machine Learning online endpoint
      reference_data:
        input_data:
          path: azureml:credit-reference:1 # use training data as comparison reference dataset
          type: mltable
        data_context: training
        data_column_names:
          target_column: DEFAULT_NEXT_MONTH
      features: 
        top_n_feature_importance: 10 # monitor drift for top 10 features
      metric_thresholds:
        numerical:
          jensen_shannon_distance: 0.01
        categorical:
          pearsons_chi_squared_test: 0.02
    advanced_data_quality:
      type: data_quality
      # reference_dataset is optional. By default reference_dataset is the production inference data associated with Azure Machine Learning online endpoint
      reference_data:
        input_data:
          path: azureml:credit-reference:1
          type: mltable
        data_context: training
      features: # monitor data quality for 3 individual features only
        - SEX
        - EDUCATION
      metric_thresholds:
        numerical:
          null_value_rate: 0.05
        categorical:
          out_of_bounds_rate: 0.03

    feature_attribution_drift_signal:
      type: feature_attribution_drift
      # production_data: is not required input here
      # Please ensure Azure Machine Learning online endpoint is enabled to collected both model_inputs and model_outputs data
      # Azure Machine Learning model monitoring will automatically join both model_inputs and model_outputs data and used it for computation
      reference_data:
        input_data:
          path: azureml:credit-reference:1
          type: mltable
        data_context: training
        data_column_names:
          target_column: DEFAULT_NEXT_MONTH
      metric_thresholds:
        normalized_discounted_cumulative_gain: 0.9
  
  alert_notification:
    emails:
      - abc@example.com
      - def@example.com

Set up model performance monitoring

Azure Machine Learning model monitoring enables you to track the performance of your models in production by calculating their performance metrics. The following model performance metrics are currently supported:

For classification models:

  • Precision
  • Accuracy
  • Recall

For regression models:

  • Mean Absolute Error (MAE)
  • Mean Squared Error (MSE)
  • Root Mean Squared Error (RMSE)

More prerequisites for model performance monitoring

You must satisfy the following requirements for you to configure your model performance signal:

  • Have output data for the production model (the model's predictions) with a unique ID for each row. If you collect production data with the Azure Machine Learning data collector, a correlation_id is provided for each inference request for you. With the data collector, you also have the option to log your own unique ID from your application.

    Note

    For Azure Machine Learning model performance monitoring, we recommend that you log your unique ID in its own column, using the Azure Machine Learning data collector.

  • Have ground truth data (actuals) with a unique ID for each row. The unique ID for a given row should match the unique ID for the model outputs for that particular inference request. This unique ID is used to join your ground truth dataset with the model outputs.

    Without having ground truth data, you can't perform model performance monitoring. Since ground truth data is encountered at the application level, it's your responsibility to collect it as it becomes available. You should also maintain a data asset in Azure Machine Learning that contains this ground truth data.

  • (Optional) Have a pre-joined tabular dataset with model outputs and ground truth data already joined together.

Monitor model performance requirements when using data collector

If you use the Azure Machine Learning data collector to collect production inference data without supplying your own unique ID for each row as a separate column, a correlationid will be autogenerated for you and included in the logged JSON object. However, the data collector will batch rows that are sent within short time intervals of each other. Batched rows will fall within the same JSON object and will thus have the same correlationid.

In order to differentiate between the rows in the same JSON object, Azure Machine Learning model performance monitoring uses indexing to determine the order of the rows in the JSON object. For example, if three rows are batched together, and the correlationid is test, row one will have an ID of test_0, row two will have an ID of test_1, and row three will have an ID of test_2. To ensure that your ground truth dataset contains unique IDs that match to the collected production inference model outputs, ensure that you index each correlationid appropriately. If your logged JSON object only has one row, then the correlationid would be correlationid_0.

To avoid using this indexing, we recommend that you log your unique ID in its own column within the pandas DataFrame that you're logging with the Azure Machine Learning data collector. Then, in your model monitoring configuration, you specify the name of this column to join your model output data with your ground truth data. As long as the IDs for each row in both datasets are the same, Azure Machine Learning model monitoring can perform model performance monitoring.

Example workflow for monitoring model performance

To understand the concepts associated with model performance monitoring, consider this example workflow. Suppose you're deploying a model to predict whether credit card transactions are fraudulent or not, you can follow these steps to monitor the model's performance:

  1. Configure your deployment to use the data collector to collect the model's production inference data (input and output data). Let's say that the output data is stored in a column is_fraud.
  2. For each row of the collected inference data, log a unique ID. The unique ID can come from your application, or you can use the correlationid that Azure Machine Learning uniquely generates for each logged JSON object.
  3. Later, when the ground truth (or actual) is_fraud data becomes available, it also gets logged and mapped to the same unique ID that was logged with the model's outputs.
  4. This ground truth is_fraud data is also collected, maintained, and registered to Azure Machine Learning as a data asset.
  5. Create a model performance monitoring signal that joins the model's production inference and ground truth data assets, using the unique ID columns.
  6. Finally, compute the model performance metrics.

Once you've satisfied the prerequisites for model performance monitoring, you can set up model monitoring with the following CLI command and YAML definition:

az ml schedule create -f ./model-performance-monitoring.yaml

The following YAML contains the definition for model monitoring with production inference data that you've collected.

$schema:  http://azureml/sdk-2-0/Schedule.json
name: model_performance_monitoring
display_name: Credit card fraud model performance
description: Credit card fraud model performance

trigger:
  type: recurrence
  frequency: day
  interval: 7 
  schedule: 
    hours: 10
    minutes: 15
  
create_monitor:
  compute: 
    instance_type: standard_e8s_v3
    runtime_version: "3.3"
  monitoring_target:
    ml_task: classification
    endpoint_deployment_id: azureml:loan-approval-endpoint:loan-approval-deployment

  monitoring_signals:
    fraud_detection_model_performance: 
      type: model_performance 
      production_data:
        data_column_names:
          prediction: is_fraud
          correlation_id: correlation_id
      reference_data:
        input_data:
          path: azureml:my_model_ground_truth_data:1
          type: mltable
        data_column_names:
          actual: is_fraud
          correlation_id: correlation_id
        data_context: actuals
      alert_enabled: true
      metric_thresholds: 
        tabular_classification:
          accuracy: 0.95
          precision: 0.8
  alert_notification: 
      emails: 
        - abc@example.com

Set up model monitoring by bringing in your production data to Azure Machine Learning

You can also set up model monitoring for models deployed to Azure Machine Learning batch endpoints or deployed outside of Azure Machine Learning. If you don't have a deployment, but you have production data, you can use the data to perform continuous model monitoring. To monitor these models, you must be able to:

  • Collect production inference data from models deployed in production.
  • Register the production inference data as an Azure Machine Learning data asset, and ensure continuous updates of the data.
  • Provide a custom data preprocessing component and register it as an Azure Machine Learning component.

You must provide a custom data preprocessing component if your data isn't collected with the data collector. Without this custom data preprocessing component, the Azure Machine Learning model monitoring system won't know how to process your data into tabular form with support for time windowing.

Your custom preprocessing component must have these input and output signatures:

Input/Output Signature name Type Description Example value
input data_window_start literal, string data window start-time in ISO8601 format. 2023-05-01T04:31:57.012Z
input data_window_end literal, string data window end-time in ISO8601 format. 2023-05-01T04:31:57.012Z
input input_data uri_folder The collected production inference data, which is registered as an Azure Machine Learning data asset. azureml:myproduction_inference_data:1
output preprocessed_data mltable A tabular dataset, which matches a subset of the reference data schema.

For an example of a custom data preprocessing component, see custom_preprocessing in the azuremml-examples GitHub repo.

Once you've satisfied the previous requirements, you can set up model monitoring with the following CLI command and YAML definition:

az ml schedule create -f ./model-monitoring-with-collected-data.yaml

The following YAML contains the definition for model monitoring with production inference data that you've collected.

# model-monitoring-with-collected-data.yaml
$schema:  http://azureml/sdk-2-0/Schedule.json
name: fraud_detection_model_monitoring
display_name: Fraud detection model monitoring
description: Fraud detection model monitoring with your own production data

trigger:
  # perform model monitoring activity daily at 3:15am
  type: recurrence
  frequency: day #can be minute, hour, day, week, month
  interval: 1 # #every day
  schedule: 
    hours: 3 # at 3am
    minutes: 15 # at 15 mins after 3am

create_monitor:
  compute: 
    instance_type: standard_e4s_v3
    runtime_version: "3.3"
  monitoring_target:
    ml_task: classification
    endpoint_deployment_id: azureml:fraud-detection-endpoint:fraud-detection-deployment
  
  monitoring_signals:

    advanced_data_drift: # monitoring signal name, any user defined name works
      type: data_drift
      # define production dataset with your collected data
      production_data:
        input_data:
          path: azureml:my_production_inference_data_model_inputs:1  # your collected data is registered as Azure Machine Learning asset
          type: uri_folder
        data_context: model_inputs
        pre_processing_component: azureml:production_data_preprocessing:1
      reference_data:
        input_data:
          path: azureml:my_model_training_data:1 # use training data as comparison baseline
          type: mltable
        data_context: training
        data_column_names:
          target_column: is_fraud
      features: 
        top_n_feature_importance: 20 # monitor drift for top 20 features
      metric_thresholds:
        numberical:
          jensen_shannon_distance: 0.01
        categorical:
          pearsons_chi_squared_test: 0.02

    advanced_prediction_drift: # monitoring signal name, any user defined name works
      type: prediction_drift
      # define production dataset with your collected data
      production_data:
        input_data:
          path: azureml:my_production_inference_data_model_outputs:1  # your collected data is registered as Azure Machine Learning asset
          type: uri_folder
        data_context: model_outputs
        pre_processing_component: azureml:production_data_preprocessing:1
      reference_data:
        input_data:
          path: azureml:my_model_validation_data:1 # use training data as comparison reference dataset
          type: mltable
        data_context: validation
      metric_thresholds:
        categorical:
          pearsons_chi_squared_test: 0.02
  
  alert_notification:
    emails:
      - abc@example.com
      - def@example.com

Set up model monitoring with custom signals and metrics

With Azure Machine Learning model monitoring, you can define a custom signal and implement any metric of your choice to monitor your model. You can register this custom signal as an Azure Machine Learning component. When your Azure Machine Learning model monitoring job runs on the specified schedule, it computes the metric(s) you've defined within your custom signal, just as it does for the prebuilt signals (data drift, prediction drift, and data quality).

To set up a custom signal to use for model monitoring, you must first define the custom signal and register it as an Azure Machine Learning component. The Azure Machine Learning component must have these input and output signatures:

Component input signature

The component input DataFrame should contain the following items:

  • An mltable with the processed data from the preprocessing component
  • Any number of literals, each representing an implemented metric as part of the custom signal component. For example, if you've implemented the metric, std_deviation, then you'll need an input for std_deviation_threshold. Generally, there should be one input per metric with the name <metric_name>_threshold.
Signature name Type Description Example value
production_data mltable A tabular dataset that matches a subset of the reference data schema.
std_deviation_threshold literal, string Respective threshold for the implemented metric. 2

Component output signature

The component output port should have the following signature.

Signature name Type Description
signal_metrics mltable The mltable that contains the computed metrics. The schema is defined in the next section signal_metrics schema.

signal_metrics schema

The component output DataFrame should contain four columns: group, metric_name, metric_value, and threshold_value.

Signature name Type Description Example value
group literal, string Top-level logical grouping to be applied to this custom metric. TRANSACTIONAMOUNT
metric_name literal, string The name of the custom metric. std_deviation
metric_value numerical The value of the custom metric. 44,896.082
threshold_value numerical The threshold for the custom metric. 2

The following table shows an example output from a custom signal component that computes the std_deviation metric:

group metric_value metric_name threshold_value
TRANSACTIONAMOUNT 44,896.082 std_deviation 2
LOCALHOUR 3.983 std_deviation 2
TRANSACTIONAMOUNTUSD 54,004.902 std_deviation 2
DIGITALITEMCOUNT 7.238 std_deviation 2
PHYSICALITEMCOUNT 5.509 std_deviation 2

To see an example custom signal component definition and metric computation code, see custom_signal in the azureml-examples repo.

Once you've satisfied the requirements for using custom signals and metrics, you can set up model monitoring with the following CLI command and YAML definition:

az ml schedule create -f ./custom-monitoring.yaml

The following YAML contains the definition for model monitoring with a custom signal. Some things to notice about the code:

  • It assumes that you've already created and registered your component with the custom signal definition in Azure Machine Learning.
  • The component_id of the registered custom signal component is azureml:my_custom_signal:1.0.0.
  • If you've collected your data with the data collector, you can omit the pre_processing_component property. If you wish to use a preprocessing component to preprocess production data not collected by the data collector, you can specify it.
# custom-monitoring.yaml
$schema:  http://azureml/sdk-2-0/Schedule.json
name: my-custom-signal
trigger:
  type: recurrence
  frequency: day # can be minute, hour, day, week, month
  interval: 7 # #every day
create_monitor:
  compute:
    instance_type: "standard_e4s_v3"
    runtime_version: "3.3"
  monitoring_signals:
    customSignal:
      type: custom
      component_id: azureml:my_custom_signal:1.0.0
      input_data:
        production_data:
          input_data:
            type: uri_folder
            path: azureml:my_production_data:1
          data_context: test
          data_window:
            lookback_window_size: P30D
            lookback_window_offset: P7D
          pre_processing_component: azureml:custom_preprocessor:1.0.0
      metric_thresholds:
        - metric_name: std_deviation
          threshold: 2
  alert_notification:
    emails:
      - abc@example.com

Interpret monitoring results

After you've configured your model monitor and the first run has completed, you can navigate back to the Monitoring tab in Azure Machine Learning studio to view the results.

  • From the main Monitoring view, select the name of your model monitor to see the Monitor overview page. This page shows the corresponding model, endpoint, and deployment, along with details regarding the signals you configured. The next image shows a monitoring dashboard that includes data drift and data quality signals. Depending on the monitoring signals you configured, your dashboard might look different.

    Screenshot showing a monitoring dashboard.

  • Look in the Notifications section of the dashboard to see, for each signal, which features breached the configured threshold for their respective metrics:

  • Select the data_drift to go to the data drift details page. On the details page, you can see the data drift metric value for each numerical and categorical feature that you included in your monitoring configuration. When your monitor has more than one run, you'll see a trendline for each feature.

    Screenshot showing the details page of the data drift signal.

  • To view an individual feature in detail, select the name of the feature to view the production distribution compared to the reference distribution. This view also allows you to track drift over time for that specific feature.

    Screenshot showing the data drift details for an individual feature.

  • Return to the monitoring dashboard and select data_quality to view the data quality signal page. On this page, you can see the null value rates, out-of-bounds rates, and data type error rates for each feature you're monitoring.

    Screenshot showing the details page of the data quality signal.

Model monitoring is a continuous process. With Azure Machine Learning model monitoring, you can configure multiple monitoring signals to obtain a broad view into the performance of your models in production.

Integrate Azure Machine Learning model monitoring with Azure Event Grid

You can use events generated by Azure Machine Learning model monitoring to set up event-driven applications, processes, or CI/CD workflows with Azure Event Grid. You can consume events through various event handlers, such as Azure Event Hubs, Azure functions, and logic apps. Based on the drift detected by your monitors, you can take action programmatically, such as by setting up a machine learning pipeline to re-train a model and re-deploy it.

To get started with integrating Azure Machine Learning model monitoring with Event Grid:

  1. Follow the steps in see Set up in Azure portal. Give your Event Subscription a name, such as MonitoringEvent, and select only the Run status changed box under Event Types.

    Warning

    Be sure to select Run status changed for the event type. Don't select Dataset drift detected, as it applies to data drift v1, rather than Azure Machine Learning model monitoring.

  2. Follow the steps in Filter & subscribe to events to set up event filtering for your scenario. Navigate to the Filters tab and add the following Key, Operator, and Value under Advanced Filters:

    • Key: data.RunTags.azureml_modelmonitor_threshold_breached
    • Value: has failed due to one or more features violating metric thresholds
    • Operator: String contains

    With this filter, events are generated when the run status changes (from Completed to Failed, or from Failed to Completed) for any monitor within your Azure Machine Learning workspace.

  3. To filter at the monitoring level, use the following Key, Operator, and Value under Advanced Filters:

    • Key: data.RunTags.azureml_modelmonitor_threshold_breached
    • Value: your_monitor_name_signal_name
    • Operator: String contains

    Ensure that your_monitor_name_signal_name is the name of a signal in the specific monitor you want to filter events for. For example, credit_card_fraud_monitor_data_drift. For this filter to work, this string must match the name of your monitoring signal. You should name your signal with both the monitor name and the signal name for this case.

  4. When you've completed your Event Subscription configuration, select the desired endpoint to serve as your event handler, such as Azure Event Hubs.

  5. After events have been captured, you can view them from the endpoint page:

    Screenshot showing events viewed from the endpoint page.

You can also view events in the Azure Monitor Metrics tab:

Screenshot showing events viewed from the Azure monitor metrics tab.