Monitor performance of models deployed to production

Grein
08/28/2024

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

Learn to use Azure Machine Learning's model monitoring to continuously track the performance of machine learning models in production. Model monitoring provides you with a broad view of monitoring signals and alerts you to potential issues. When you monitor signals and performance metrics of models in production, you can critically evaluate the inherent risks associated with them and identify blind spots that could adversely affect your business.

In this article you, learn to perform the following tasks:

Set up out-of box and advanced monitoring for models that are deployed to Azure Machine Learning online endpoints
Monitor performance metrics for models in production
Monitor models that are deployed outside Azure Machine Learning or deployed to Azure Machine Learning batch endpoints
Set up model monitoring with custom signals and metrics
Interpret monitoring results
Integrate Azure Machine Learning model monitoring with Azure Event Grid

Prerequisites

Before following the steps in this article, make sure you have the following prerequisites:

The Azure CLI and the ml extension to the Azure CLI. For more information, see Install, set up, and use the CLI (v2).

Important

The CLI examples in this article assume that you are using the Bash (or compatible) shell. For example, from a Linux system or Windows Subsystem for Linux.
An Azure Machine Learning workspace. If you don't have one, use the steps in the Install, set up, and use the CLI (v2) to create one.

Before following the steps in this article, make sure you have the following prerequisites:

An Azure Machine Learning workspace. If you don't have one, use the steps in the Quickstart: Create workspace resources article to create one.
To install the Python SDK v2, use the following command:
```
pip install azure-ai-ml azure-identity
```
To update an existing installation of the SDK to the latest version, use the following command:
```
pip install --upgrade azure-ai-ml azure-identity
```
For more information, see Install the Python SDK v2 for Azure Machine Learning.

Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the owner or contributor role for the Azure Machine Learning workspace, or a custom role allowing Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*. For more information, see Manage access to an Azure Machine Learning workspace.
For monitoring a model that is deployed to an Azure Machine Learning online endpoint (managed online endpoint or Kubernetes online endpoint), be sure to:
- Have a model already deployed to an Azure Machine Learning online endpoint. Both managed online endpoint and Kubernetes online endpoint are supported. If you don't have a model deployed to an Azure Machine Learning online endpoint, see Deploy and score a machine learning model by using an online endpoint.
- Enable data collection for your model deployment. You can enable data collection during the deployment step for Azure Machine Learning online endpoints. For more information, see Collect production data from models deployed to a real-time endpoint.
For monitoring a model that is deployed to an Azure Machine Learning batch endpoint or deployed outside of Azure Machine Learning, be sure to:
- Have a means to collect production data and register it as an Azure Machine Learning data asset.
- Update the registered data asset continuously for model monitoring.
- (Recommended) Register the model in an Azure Machine Learning workspace, for lineage tracking.

Important

Model monitoring jobs are scheduled to run on serverless Spark compute pools with support for the following VM instance types: Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, and Standard_E64s_v3. You can select the VM instance type with the create_monitor.compute.instance_type property in your YAML configuration or from the dropdown in the Azure Machine Learning studio.

Set up out-of-box model monitoring

Suppose you deploy your model to production in an Azure Machine Learning online endpoint and enable data collection at deployment time. In this scenario, Azure Machine Learning collects production inference data, and automatically stores it in Microsoft Azure Blob Storage. You can then use Azure Machine Learning model monitoring to continuously monitor this production inference data.

You can use the Azure CLI, the Python SDK, or the studio for an out-of-box setup of model monitoring. The out-of-box model monitoring configuration provides the following monitoring capabilities:

Azure Machine Learning automatically detects the production inference dataset associated with an Azure Machine Learning online deployment and uses the dataset for model monitoring.
The comparison reference dataset is set as the recent, past production inference dataset.
Monitoring setup automatically includes and tracks the built-in monitoring signals: data drift, prediction drift, and data quality. For each monitoring signal, Azure Machine Learning uses:
- the recent, past production inference dataset as the comparison reference dataset.
- smart defaults for metrics and thresholds.
A monitoring job is scheduled to run daily at 3:15am (for this example) to acquire monitoring signals and evaluate each metric result against its corresponding threshold. By default, when any threshold is exceeded, Azure Machine Learning sends an alert email to the user that set up the monitor.

Azure Machine Learning model monitoring uses az ml schedule to schedule a monitoring job. You can create the out-of-box model monitor with the following CLI command and YAML definition:

az ml schedule create -f ./out-of-box-monitoring.yaml

The following YAML contains the definition for the out-of-box model monitoring.

# out-of-box-monitoring.yaml
$schema:  http://azureml/sdk-2-0/Schedule.json
name: credit_default_model_monitoring
display_name: Credit default model monitoring
description: Credit default model monitoring setup with minimal configurations

trigger:
  # perform model monitoring activity daily at 3:15am
  type: recurrence
  frequency: day #can be minute, hour, day, week, month
  interval: 1 # #every day
  schedule: 
    hours: 3 # at 3am
    minutes: 15 # at 15 mins after 3am

create_monitor:

  compute: # specify a spark compute for monitoring job
    instance_type: standard_e4s_v3
    runtime_version: "3.3"

  monitoring_target: 
    ml_task: classification # model task type: [classification, regression, question_answering]
    endpoint_deployment_id: azureml:credit-default:main # azureml endpoint deployment id

  alert_notification: # emails to get alerts
    emails:
      - abc@example.com
      - def@example.com

You can use the following code to set up the out-of-box model monitoring:

from azure.identity import DefaultAzureCredential
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    AlertNotification,
    MonitoringTarget,
    MonitorDefinition,
    MonitorSchedule,
    RecurrencePattern,
    RecurrenceTrigger,
    ServerlessSparkCompute
)

# get a handle to the workspace
ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="subscription_id",
    resource_group_name="resource_group_name",
    workspace_name="workspace_name",
)

# create the compute
spark_compute = ServerlessSparkCompute(
    instance_type="standard_e4s_v3",
    runtime_version="3.3"
)

# specify your online endpoint deployment
monitoring_target = MonitoringTarget(
    ml_task="classification",
    endpoint_deployment_id="azureml:credit-default:main"
)


# create alert notification object
alert_notification = AlertNotification(
    emails=['abc@example.com', 'def@example.com']
)

# create the monitor definition
monitor_definition = MonitorDefinition(
    compute=spark_compute,
    monitoring_target=monitoring_target,
    alert_notification=alert_notification
)

# specify the schedule frequency
recurrence_trigger = RecurrenceTrigger(
    frequency="day",
    interval=1,
    schedule=RecurrencePattern(hours=3, minutes=15)
)

# create the monitor
model_monitor = MonitorSchedule(
    name="credit_default_monitor_basic",
    trigger=recurrence_trigger,
    create_monitor=monitor_definition
)

poller = ml_client.schedules.begin_create_or_update(model_monitor)
created_monitor = poller.result()

Set up advanced model monitoring

Azure Machine Learning provides many capabilities for continuous model monitoring. See Capabilities of model monitoring for a comprehensive list of these capabilities. In many cases, you need to set up model monitoring with advanced monitoring capabilities. In the following sections, you set up model monitoring with these capabilities:

Use of multiple monitoring signals for a broad view.
Use of historical model training data or validation data as the comparison reference dataset.
Monitoring of top N most important features and individual features.

Configure feature importance

Feature importance represents the relative importance of each input feature to a model's output. For example, temperature might be more important to a model's prediction compared to elevation. Enabling feature importance can give you visibility into which features you don't want drifting or having data quality issues in production.

To enable feature importance with any of your signals (such as data drift or data quality), you need to provide:

Your training dataset as the reference_data dataset.
The reference_data.data_column_names.target_column property, which is the name of your model's output/prediction column.

After enabling feature importance, you'll see a feature importance for each feature you're monitoring in the Azure Machine Learning model monitoring studio UI.

You can enable/disable alerts for each signal by setting alert_enabled property while using SDK or CLI.

You can use Azure CLI, the Python SDK, or the studio for advanced setup of model monitoring.

Create advanced model monitoring setup with the following CLI command and YAML definition:

az ml schedule create -f ./advanced-model-monitoring.yaml

The following YAML contains the definition for advanced model monitoring.

# advanced-model-monitoring.yaml
$schema:  http://azureml/sdk-2-0/Schedule.json
name: fraud_detection_model_monitoring
display_name: Fraud detection model monitoring
description: Fraud detection model monitoring with advanced configurations

trigger:
  # perform model monitoring activity daily at 3:15am
  type: recurrence
  frequency: day #can be minute, hour, day, week, month
  interval: 1 # #every day
  schedule: 
    hours: 3 # at 3am
    minutes: 15 # at 15 mins after 3am

create_monitor:

  compute: 
    instance_type: standard_e4s_v3
    runtime_version: "3.3"

  monitoring_target:
    ml_task: classification
    endpoint_deployment_id: azureml:credit-default:main
  
  monitoring_signals:
    advanced_data_drift: # monitoring signal name, any user defined name works
      type: data_drift
      # reference_dataset is optional. By default referece_dataset is the production inference data associated with Azure Machine Learning online endpoint
      reference_data:
        input_data:
          path: azureml:credit-reference:1 # use training data as comparison reference dataset
          type: mltable
        data_context: training
        data_column_names:
          target_column: DEFAULT_NEXT_MONTH
      features: 
        top_n_feature_importance: 10 # monitor drift for top 10 features
      alert_enabled: true
      metric_thresholds:
        numerical:
          jensen_shannon_distance: 0.01
        categorical:
          pearsons_chi_squared_test: 0.02
    advanced_data_quality:
      type: data_quality
      # reference_dataset is optional. By default reference_dataset is the production inference data associated with Azure Machine Learning online endpoint
      reference_data:
        input_data:
          path: azureml:credit-reference:1
          type: mltable
        data_context: training
      features: # monitor data quality for 3 individual features only
        - SEX
        - EDUCATION
      alert_enabled: true
      metric_thresholds:
        numerical:
          null_value_rate: 0.05
        categorical:
          out_of_bounds_rate: 0.03

    feature_attribution_drift_signal:
      type: feature_attribution_drift
      # production_data: is not required input here
      # Please ensure Azure Machine Learning online endpoint is enabled to collected both model_inputs and model_outputs data
      # Azure Machine Learning model monitoring will automatically join both model_inputs and model_outputs data and used it for computation
      reference_data:
        input_data:
          path: azureml:credit-reference:1
          type: mltable
        data_context: training
        data_column_names:
          target_column: DEFAULT_NEXT_MONTH
      alert_enabled: true
      metric_thresholds:
        normalized_discounted_cumulative_gain: 0.9
  
  alert_notification:
    emails:
      - abc@example.com
      - def@example.com

Use the following code for advanced model monitoring setup:

from azure.identity import DefaultAzureCredential
from azure.ai.ml import Input, MLClient
from azure.ai.ml.constants import (
    MonitorDatasetContext,
)
from azure.ai.ml.entities import (
    AlertNotification,
    BaselineDataRange,
    DataDriftSignal,
    DataQualitySignal,
    PredictionDriftSignal,
    DataDriftMetricThreshold,
    DataQualityMetricThreshold,
    FeatureAttributionDriftMetricThreshold,
    FeatureAttributionDriftSignal,
    PredictionDriftMetricThreshold,
    NumericalDriftMetrics,
    CategoricalDriftMetrics,
    DataQualityMetricsNumerical,
    DataQualityMetricsCategorical,
    MonitorFeatureFilter,
    MonitoringTarget,
    MonitorDefinition,
    MonitorSchedule,
    RecurrencePattern,
    RecurrenceTrigger,
    ServerlessSparkCompute,
    ReferenceData,
    ProductionData
)

# get a handle to the workspace
ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="subscription_id",
    resource_group_name="resource_group_name",
    workspace_name="workspace_name",
)

# create your compute
spark_compute = ServerlessSparkCompute(
    instance_type="standard_e4s_v3",
    runtime_version="3.3"
)

# specify the online deployment (if you have one)
monitoring_target = MonitoringTarget(
    ml_task="classification",
    endpoint_deployment_id="azureml:credit-default:main"
)

# specify a lookback window size and offset, or omit this to use the defaults, which are specified in the documentation
data_window = BaselineDataRange(lookback_window_size="P1D", lookback_window_offset="P0D")

production_data = ProductionData(
    input_data=Input(
        type="uri_folder",
        path="azureml:credit-default-main-model_inputs:1"
    ),
    data_window=data_window,
    data_context=MonitorDatasetContext.MODEL_INPUTS,
)

# training data to be used as reference dataset
reference_data_training = ReferenceData(
    input_data=Input(
        type="mltable",
        path="azureml:credit-default-reference:1"
    ),
    data_column_names={
        "target_column":"DEFAULT_NEXT_MONTH"
    },
    data_context=MonitorDatasetContext.TRAINING,
)

# create an advanced data drift signal
features = MonitorFeatureFilter(top_n_feature_importance=10)

metric_thresholds = DataDriftMetricThreshold(
    numerical=NumericalDriftMetrics(
        jensen_shannon_distance=0.01
    ),
    categorical=CategoricalDriftMetrics(
        pearsons_chi_squared_test=0.02
    )
)

advanced_data_drift = DataDriftSignal(
    reference_data=reference_data_training,
    features=features,
    metric_thresholds=metric_thresholds,
    alert_enabled=True
)

# create an advanced prediction drift signal
metric_thresholds = PredictionDriftMetricThreshold(
    categorical=CategoricalDriftMetrics(
        jensen_shannon_distance=0.01
    )
)

advanced_prediction_drift = PredictionDriftSignal(
    reference_data=reference_data_training,
    metric_thresholds=metric_thresholds,
    alert_enabled=True
)

# create an advanced data quality signal
features = ['SEX', 'EDUCATION', 'AGE']

metric_thresholds = DataQualityMetricThreshold(
    numerical=DataQualityMetricsNumerical(
        null_value_rate=0.01
    ),
    categorical=DataQualityMetricsCategorical(
        out_of_bounds_rate=0.02
    )
)

advanced_data_quality = DataQualitySignal(
    reference_data=reference_data_training,
    features=features,
    metric_thresholds=metric_thresholds,
    alert_enabled=True
)

# create feature attribution drift signal
metric_thresholds = FeatureAttributionDriftMetricThreshold(normalized_discounted_cumulative_gain=0.9)

feature_attribution_drift = FeatureAttributionDriftSignal(
    reference_data=reference_data_training,
    metric_thresholds=metric_thresholds,
    alert_enabled=True
)

# put all monitoring signals in a dictionary
monitoring_signals = {
    'data_drift_advanced':advanced_data_drift,
    'data_quality_advanced':advanced_data_quality,
    'feature_attribution_drift':feature_attribution_drift,
}

# create alert notification object
alert_notification = AlertNotification(
    emails=['abc@example.com', 'def@example.com']
)

# create the monitor definition
monitor_definition = MonitorDefinition(
    compute=spark_compute,
    monitoring_target=monitoring_target,
    monitoring_signals=monitoring_signals,
    alert_notification=alert_notification
)

# specify the frequency on which to run your monitor
recurrence_trigger = RecurrenceTrigger(
    frequency="day",
    interval=1,
    schedule=RecurrencePattern(hours=3, minutes=15)
)

# create your monitor
model_monitor = MonitorSchedule(
    name="credit_default_monitor_advanced",
    trigger=recurrence_trigger,
    create_monitor=monitor_definition
)

poller = ml_client.schedules.begin_create_or_update(model_monitor)
created_monitor = poller.result()

To set up advanced monitoring:

Complete the entires on the Basic settings page as described earlier in the Set up out-of-box model monitoring section.
Select Next to open the Configure data asset page of the Advanced settings section.
Add a dataset to be used as the reference dataset. We recommend that you use the model training data as the comparison reference dataset for data drift and data quality. Also, use the model validation data as the comparison reference dataset for prediction drift.
Select Next to go to the Select monitoring signals page. On this page, you see some monitoring signals already added (if you selected an Azure Machine Learning online deployment earlier). The signals (data drift, prediction drift, and data quality) use recent, past production data as the comparison reference dataset and use smart defaults for metrics and thresholds.
Select Edit next to the data drift signal.
Configure the data drift in the Edit signal window as follows:
1. In step 1, for the production data asset, select your model inputs dataset. Also, make the following selection:
  - Select the desired lookback window size.
2. In step 2, for the reference data asset, select your training dataset. Also, make the following selection:
  - Select the target (output) column.
3. In step 3, select to monitor drift for the top N most important features, or monitor drift for a specific set of features.
4. In step 4, select your preferred metric and thresholds to use for numerical features.
5. In step 5, select your preferred metric and thresholds to use for categorical features.
Select Save to return to the Select monitoring signals page.
Select Add to open the Edit Signal window.
Select Feature attribution drift (preview) to configure the feature attribution drift signal as follows:
1. In step 1, select the production data asset that has your model inputs
  - Also, select the desired lookback window size.
2. In step 2, select the production data asset that has your model outputs.
  - Also, select the common column between these data assets to join them on. If the data was collected with the data collector, the common column is correlationid.
3. (Optional) If you used the data collector to collect data that has your model inputs and outputs already joined, select the joined dataset as your production data asset (in step 1)
  - Also, Remove step 2 in the configuration panel.
4. In step 3, select your training dataset to use as the reference dataset.
  - Also, select the target (output) column for your training dataset.
5. In step 4, select your preferred metric and threshold.
Select Save to return to the Select monitoring signals page.
When you're finished with your monitoring signals configuration, select Next to go to the Notifications page.
On the Notifications page, enable alert notifications for each signal and select Next.
Review your settings on the Review monitoring settings page.
Select Create to create your advanced model monitor.

Set up model performance monitoring

Azure Machine Learning model monitoring enables you to track the performance of your models in production by calculating their performance metrics. The following model performance metrics are currently supported:

For classification models:

Precision
Accuracy
Recall

For regression models:

Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)

More prerequisites for model performance monitoring

You must satisfy the following requirements for you to configure your model performance signal:

Have output data for the production model (the model's predictions) with a unique ID for each row. If you collect production data with the Azure Machine Learning data collector, a correlation_id is provided for each inference request for you. With the data collector, you also have the option to log your own unique ID from your application.

Note

For Azure Machine Learning model performance monitoring, we recommend that you log your unique ID in its own column, using the Azure Machine Learning data collector.
Have ground truth data (actuals) with a unique ID for each row. The unique ID for a given row should match the unique ID for the model outputs for that particular inference request. This unique ID is used to join your ground truth dataset with the model outputs.

Without having ground truth data, you can't perform model performance monitoring. Since ground truth data is encountered at the application level, it's your responsibility to collect it as it becomes available. You should also maintain a data asset in Azure Machine Learning that contains this ground truth data.
(Optional) Have a pre-joined tabular dataset with model outputs and ground truth data already joined together.

Monitor model performance requirements when using data collector

If you use the Azure Machine Learning data collector to collect production inference data without supplying your own unique ID for each row as a separate column, a correlationid will be autogenerated for you and included in the logged JSON object. However, the data collector will batch rows that are sent within short time intervals of each other. Batched rows will fall within the same JSON object and will thus have the same correlationid.

In order to differentiate between the rows in the same JSON object, Azure Machine Learning model performance monitoring uses indexing to determine the order of the rows in the JSON object. For example, if three rows are batched together, and the correlationid is test, row one will have an ID of test_0, row two will have an ID of test_1, and row three will have an ID of test_2. To ensure that your ground truth dataset contains unique IDs that match to the collected production inference model outputs, ensure that you index each correlationid appropriately. If your logged JSON object only has one row, then the correlationid would be correlationid_0.

To avoid using this indexing, we recommend that you log your unique ID in its own column within the pandas DataFrame that you're logging with the Azure Machine Learning data collector. Then, in your model monitoring configuration, you specify the name of this column to join your model output data with your ground truth data. As long as the IDs for each row in both datasets are the same, Azure Machine Learning model monitoring can perform model performance monitoring.

Example workflow for monitoring model performance

To understand the concepts associated with model performance monitoring, consider this example workflow. Suppose you're deploying a model to predict whether credit card transactions are fraudulent or not, you can follow these steps to monitor the model's performance:

Configure your deployment to use the data collector to collect the model's production inference data (input and output data). Let's say that the output data is stored in a column is_fraud.
For each row of the collected inference data, log a unique ID. The unique ID can come from your application, or you can use the correlationid that Azure Machine Learning uniquely generates for each logged JSON object.
Later, when the ground truth (or actual) is_fraud data becomes available, it also gets logged and mapped to the same unique ID that was logged with the model's outputs.
This ground truth is_fraud data is also collected, maintained, and registered to Azure Machine Learning as a data asset.
Create a model performance monitoring signal that joins the model's production inference and ground truth data assets, using the unique ID columns.
Finally, compute the model performance metrics.

Once you've satisfied the prerequisites for model performance monitoring, you can set up model monitoring with the following CLI command and YAML definition:

az ml schedule create -f ./model-performance-monitoring.yaml

The following YAML contains the definition for model monitoring with production inference data that you've collected.

$schema:  http://azureml/sdk-2-0/Schedule.json
name: model_performance_monitoring
display_name: Credit card fraud model performance
description: Credit card fraud model performance

trigger:
  type: recurrence
  frequency: day
  interval: 7 
  schedule: 
    hours: 10
    minutes: 15
  
create_monitor:
  compute: 
    instance_type: standard_e8s_v3
    runtime_version: "3.3"
  monitoring_target:
    ml_task: classification
    endpoint_deployment_id: azureml:loan-approval-endpoint:loan-approval-deployment

  monitoring_signals:
    fraud_detection_model_performance: 
      type: model_performance 
      production_data:
        data_column_names:
          prediction: is_fraud
          correlation_id: correlation_id
      reference_data:
        input_data:
          path: azureml:my_model_ground_truth_data:1
          type: mltable
        data_column_names:
          actual: is_fraud
          correlation_id: correlation_id
        data_context: actuals
      alert_enabled: true
      metric_thresholds: 
        tabular_classification:
          accuracy: 0.95
          precision: 0.8
  alert_notification: 
      emails: 
        - abc@example.com

Once you've satisfied the prerequisites for model performance monitoring, you can set up model monitoring with the following Python code:

from azure.identity import DefaultAzureCredential
from azure.ai.ml import Input, MLClient
from azure.ai.ml.constants import (
    MonitorDatasetContext,
)
from azure.ai.ml.entities import (
    AlertNotification,
    BaselineDataRange,
    ModelPerformanceMetricThreshold,
    ModelPerformanceSignal,
    ModelPerformanceClassificationThresholds,
    MonitoringTarget,
    MonitorDefinition,
    MonitorSchedule,
    RecurrencePattern,
    RecurrenceTrigger,
    ServerlessSparkCompute,
    ReferenceData,
    ProductionData
)

# get a handle to the workspace
ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="subscription_id",
    resource_group_name="resource_group_name",
    workspace_name="workspace_name",
)

# create your compute
spark_compute = ServerlessSparkCompute(
    instance_type="standard_e4s_v3",
    runtime_version="3.3"
)

# reference your azureml endpoint and deployment
monitoring_target = MonitoringTarget(
    ml_task="classification",
)

# MDC-generated production data with data column names 
production_data = ProductionData(
    input_data=Input(
        type="uri_folder",
        path="azureml:credit-default-main-model_outputs:1"
    ),
    data_column_names={
        "target_column": "DEFAULT_NEXT_MONTH",
        "join_column": "correlationid"
    },
    data_window=BaselineDataRange(
        lookback_window_offset="P0D",
        lookback_window_size="P10D",
    )
)

# ground truth reference data 
reference_data_ground_truth = ReferenceData(
    input_data=Input(
        type="mltable",
        path="azureml:credit-ground-truth:1"
    ),
    data_column_names={
        "target_column": "ground_truth",
        "join_column": "correlationid"
    },
    data_context=MonitorDatasetContext.GROUND_TRUTH_DATA,
)

# create the model performance signal
metric_thresholds = ModelPerformanceMetricThreshold(
    classification=ModelPerformanceClassificationThresholds(
        accuracy=0.50,
        precision=0.50,
        recall=0.50
    ),
)

model_performance = ModelPerformanceSignal(
    production_data=production_data,
    reference_data=reference_data_ground_truth,
    metric_thresholds=metric_thresholds,
    alert_enabled=True
)

# put all monitoring signals in a dictionary
monitoring_signals = {
    'model_performance':model_performance,
}

# create alert notification object
alert_notification = AlertNotification(
    emails=['abc@example.com', 'def@example.com']
)

# Finally monitor definition
monitor_definition = MonitorDefinition(
    compute=spark_compute,
    monitoring_target=monitoring_target,
    monitoring_signals=monitoring_signals,
    alert_notification=alert_notification
)

recurrence_trigger = RecurrenceTrigger(
    frequency="day",
    interval=1,
    schedule=RecurrencePattern(hours=3, minutes=15)
)

model_monitor = MonitorSchedule(
    name="credit_default_model_performance",
    trigger=recurrence_trigger,
    create_monitor=monitor_definition
)

poller = ml_client.schedules.begin_create_or_update(model_monitor)
created_monitor = poller.result()

To set up model performance monitoring:

Complete the entries on the Basic settings page as described earlier in the Set up out-of-box model monitoring section.
Select Next to open the Configure data asset page of the Advanced settings section.
Select + Add to add a dataset for use as the ground truth dataset.

Ensure that your model outputs dataset is also included in the list of added datasets. The ground truth dataset you add should have a unique ID column. The values in the unique ID column for both the ground truth dataset and the model outputs dataset must match in order for both datasets to be joined together prior to metric computation.
Select Next to go to the Select monitoring signals page. On this page, you will see some monitoring signals already added (if you selected an Azure Machine Learning online deployment earlier).
Delete the existing monitoring signals on the page, since you're only interested in creating a model performance monitoring signal.
Select Add to open the Edit Signal window.
Select Model performance (preview) to configure the model performance signal as follows:
1. In step 1, for the production data asset, select your model outputs dataset. Also, make the following selections:
  - Select the appropriate target column (for example, is_fraud).
  - Select the desired lookback window size and lookback window offset.
2. In step 2, for the reference data asset, select the ground truth data asset that you added earlier. Also, make the following selections:
  - Select the appropriate target column.
  - Select the column on which to perform the join with the model outputs dataset. The column used for the join should be the column that is common between the two datasets and which has a unique ID for each row in the dataset (for example, correlationid).
3. In step 3, select your desired performance metrics and specify their respective thresholds.
Select Save to return to the Select monitoring signals page.
Select Next to go to the Notifications page.
On the Notifications page, enable alert notification for the model performance signal and select Next.
Review your settings on the Review monitoring settings page.
Select Create to create your model performance monitor.

Set up model monitoring by bringing in your production data to Azure Machine Learning

You can also set up model monitoring for models deployed to Azure Machine Learning batch endpoints or deployed outside of Azure Machine Learning. If you don't have a deployment, but you have production data, you can use the data to perform continuous model monitoring. To monitor these models, you must be able to:

Collect production inference data from models deployed in production.
Register the production inference data as an Azure Machine Learning data asset, and ensure continuous updates of the data.
Provide a custom data preprocessing component and register it as an Azure Machine Learning component.

You must provide a custom data preprocessing component if your data isn't collected with the data collector. Without this custom data preprocessing component, the Azure Machine Learning model monitoring system won't know how to process your data into tabular form with support for time windowing.

Your custom preprocessing component must have these input and output signatures:

Input/Output	Signature name	Type	Description	Example value
input	`data_window_start`	literal, string	data window start-time in ISO8601 format.	2023-05-01T04:31:57.012Z
input	`data_window_end`	literal, string	data window end-time in ISO8601 format.	2023-05-01T04:31:57.012Z
input	`input_data`	uri_folder	The collected production inference data, which is registered as an Azure Machine Learning data asset.	azureml:myproduction_inference_data:1
output	`preprocessed_data`	mltable	A tabular dataset, which matches a subset of the reference data schema.

For an example of a custom data preprocessing component, see custom_preprocessing in the azuremml-examples GitHub repo.

Once you've satisfied the previous requirements, you can set up model monitoring with the following CLI command and YAML definition:

az ml schedule create -f ./model-monitoring-with-collected-data.yaml

The following YAML contains the definition for model monitoring with production inference data that you've collected.

# model-monitoring-with-collected-data.yaml
$schema:  http://azureml/sdk-2-0/Schedule.json
name: fraud_detection_model_monitoring
display_name: Fraud detection model monitoring
description: Fraud detection model monitoring with your own production data

trigger:
  # perform model monitoring activity daily at 3:15am
  type: recurrence
  frequency: day #can be minute, hour, day, week, month
  interval: 1 # #every day
  schedule: 
    hours: 3 # at 3am
    minutes: 15 # at 15 mins after 3am

create_monitor:
  compute: 
    instance_type: standard_e4s_v3
    runtime_version: "3.3"
  monitoring_target:
    ml_task: classification
    endpoint_deployment_id: azureml:fraud-detection-endpoint:fraud-detection-deployment
  
  monitoring_signals:

    advanced_data_drift: # monitoring signal name, any user defined name works
      type: data_drift
      # define production dataset with your collected data
      production_data:
        input_data:
          path: azureml:my_production_inference_data_model_inputs:1  # your collected data is registered as Azure Machine Learning asset
          type: uri_folder
        data_context: model_inputs
        pre_processing_component: azureml:production_data_preprocessing:1
      reference_data:
        input_data:
          path: azureml:my_model_training_data:1 # use training data as comparison baseline
          type: mltable
        data_context: training
        data_column_names:
          target_column: is_fraud
      features: 
        top_n_feature_importance: 20 # monitor drift for top 20 features
      alert_enabled: true
      metric_thresholds:
        numberical:
          jensen_shannon_distance: 0.01
        categorical:
          pearsons_chi_squared_test: 0.02

    advanced_prediction_drift: # monitoring signal name, any user defined name works
      type: prediction_drift
      # define production dataset with your collected data
      production_data:
        input_data:
          path: azureml:my_production_inference_data_model_outputs:1  # your collected data is registered as Azure Machine Learning asset
          type: uri_folder
        data_context: model_outputs
        pre_processing_component: azureml:production_data_preprocessing:1
      reference_data:
        input_data:
          path: azureml:my_model_validation_data:1 # use training data as comparison reference dataset
          type: mltable
        data_context: validation
      alert_enabled: true
      metric_thresholds:
        categorical:
          pearsons_chi_squared_test: 0.02
  
  alert_notification:
    emails:
      - abc@example.com
      - def@example.com

Once you've satisfied the previous requirements, you can set up model monitoring with the following Python code:

from azure.identity import InteractiveBrowserCredential
from azure.ai.ml import Input, MLClient
from azure.ai.ml.constants import (
    MonitorFeatureType,
    MonitorMetricName,
    MonitorDatasetContext
)
from azure.ai.ml.entities import (
    AlertNotification,
    DataDriftSignal,
    DataQualitySignal,
    DataDriftMetricThreshold,
    DataQualityMetricThreshold,
    NumericalDriftMetrics,
    CategoricalDriftMetrics,
    DataQualityMetricsNumerical,
    DataQualityMetricsCategorical,
    MonitorFeatureFilter,
    MonitorInputData,
    MonitoringTarget,
    MonitorDefinition,
    MonitorSchedule,
    RecurrencePattern,
    RecurrenceTrigger,
    ServerlessSparkCompute,
    ReferenceData,
    ProductionData
)

# get a handle to the workspace
ml_client = MLClient(
   InteractiveBrowserCredential(),
   subscription_id,
   resource_group,
   workspace
)

spark_compute = ServerlessSparkCompute(
    instance_type="standard_e4s_v3",
    runtime_version="3.2"
)

#define target dataset (production dataset)
production_data = ProductionData(
    input_data=Input(
        type="uri_folder",
        path="azureml:my_model_production_data:1"
    ),
    data_context=MonitorDatasetContext.MODEL_INPUTS,
    pre_processing_component="azureml:production_data_preprocessing:1"
)


# training data to be used as reference dataset
reference_data_training = ReferenceData(
    input_data=Input(
        type="mltable",
        path="azureml:my_model_training_data:1"
    ),
    data_context=MonitorDatasetContext.TRAINING
)

# create an advanced data drift signal
features = MonitorFeatureFilter(top_n_feature_importance=20)
metric_thresholds = DataDriftMetricThreshold(
    numerical=NumericalDriftMetrics(
        jensen_shannon_distance=0.01
    ),
    categorical=CategoricalDriftMetrics(
        pearsons_chi_squared_test=0.02
    )
)

advanced_data_drift = DataDriftSignal(
    production_data=production_data,
    reference_data=reference_data_training,
    features=features,
    metric_thresholds=metric_thresholds,
    alert_enabled=True
)


# create an advanced data quality signal
features = ['feature_A', 'feature_B', 'feature_C']
metric_thresholds = DataQualityMetricThreshold(
    numerical=DataQualityMetricsNumerical(
        null_value_rate=0.01
    ),
    categorical=DataQualityMetricsCategorical(
        out_of_bounds_rate=0.02
    )
)

advanced_data_quality = DataQualitySignal(
    production_data=production_data,
    reference_data=reference_data_training,
    features=features,
    metric_thresholds=metric_thresholds,
    alert_enabled=True
)

# put all monitoring signals in a dictionary
monitoring_signals = {
    'data_drift_advanced': advanced_data_drift,
    'data_quality_advanced': advanced_data_quality
}

# create alert notification object
alert_notification = AlertNotification(
    emails=['abc@example.com', 'def@example.com']
)

# Finally monitor definition
monitor_definition = MonitorDefinition(
    compute=spark_compute,
    monitoring_signals=monitoring_signals,
    alert_notification=alert_notification
)

recurrence_trigger = RecurrenceTrigger(
    frequency="day",
    interval=1,
    schedule=RecurrencePattern(hours=3, minutes=15)
)

model_monitor = MonitorSchedule(
    name="fraud_detection_model_monitoring_advanced",
    trigger=recurrence_trigger,
    create_monitor=monitor_definition
)

poller = ml_client.schedules.begin_create_or_update(model_monitor)
created_monitor = poller.result()

Set up model monitoring with custom signals and metrics

With Azure Machine Learning model monitoring, you can define a custom signal and implement any metric of your choice to monitor your model. You can register this custom signal as an Azure Machine Learning component. When your Azure Machine Learning model monitoring job runs on the specified schedule, it computes the metric(s) you've defined within your custom signal, just as it does for the prebuilt signals (data drift, prediction drift, and data quality).

To set up a custom signal to use for model monitoring, you must first define the custom signal and register it as an Azure Machine Learning component. The Azure Machine Learning component must have these input and output signatures:

Component input signature

The component input DataFrame should contain the following items:

An mltable with the processed data from the preprocessing component
Any number of literals, each representing an implemented metric as part of the custom signal component. For example, if you've implemented the metric, std_deviation, then you'll need an input for std_deviation_threshold. Generally, there should be one input per metric with the name <metric_name>_threshold.

Signature name	Type	Description	Example value
production_data	mltable	A tabular dataset that matches a subset of the reference data schema.
std_deviation_threshold	literal, string	Respective threshold for the implemented metric.	2

Component output signature

The component output port should have the following signature.

Signature name	Type	Description
signal_metrics	mltable	The mltable that contains the computed metrics. The schema is defined in the next section signal_metrics schema.

signal_metrics schema

The component output DataFrame should contain four columns: group, metric_name, metric_value, and threshold_value.

Signature name	Type	Description	Example value
group	literal, string	Top-level logical grouping to be applied to this custom metric.	TRANSACTIONAMOUNT
metric_name	literal, string	The name of the custom metric.	std_deviation
metric_value	numerical	The value of the custom metric.	44,896.082
threshold_value	numerical	The threshold for the custom metric.	2

The following table shows an example output from a custom signal component that computes the std_deviation metric:

group	metric_value	metric_name	threshold_value
TRANSACTIONAMOUNT	44,896.082	std_deviation	2
LOCALHOUR	3.983	std_deviation	2
TRANSACTIONAMOUNTUSD	54,004.902	std_deviation	2
DIGITALITEMCOUNT	7.238	std_deviation	2
PHYSICALITEMCOUNT	5.509	std_deviation	2

To see an example custom signal component definition and metric computation code, see custom_signal in the azureml-examples repo.

Once you've satisfied the requirements for using custom signals and metrics, you can set up model monitoring with the following CLI command and YAML definition:

az ml schedule create -f ./custom-monitoring.yaml

The following YAML contains the definition for model monitoring with a custom signal. Some things to notice about the code:

It assumes that you've already created and registered your component with the custom signal definition in Azure Machine Learning.
The component_id of the registered custom signal component is azureml:my_custom_signal:1.0.0.
If you've collected your data with the data collector, you can omit the pre_processing_component property. If you wish to use a preprocessing component to preprocess production data not collected by the data collector, you can specify it.

# custom-monitoring.yaml
$schema:  http://azureml/sdk-2-0/Schedule.json
name: my-custom-signal
trigger:
  type: recurrence
  frequency: day # can be minute, hour, day, week, month
  interval: 7 # #every day
create_monitor:
  compute:
    instance_type: "standard_e4s_v3"
    runtime_version: "3.3"
  monitoring_signals:
    customSignal:
      type: custom
      component_id: azureml:my_custom_signal:1.0.0
      input_data:
        production_data:
          input_data:
            type: uri_folder
            path: azureml:my_production_data:1
          data_context: test
          data_window:
            lookback_window_size: P30D
            lookback_window_offset: P7D
          pre_processing_component: azureml:custom_preprocessor:1.0.0
      metric_thresholds:
        - metric_name: std_deviation
          threshold: 2
  alert_notification:
    emails:
      - abc@example.com

Interpret monitoring results

After you've configured your model monitor and the first run has completed, you can navigate back to the Monitoring tab in Azure Machine Learning studio to view the results.

From the main Monitoring view, select the name of your model monitor to see the Monitor overview page. This page shows the corresponding model, endpoint, and deployment, along with details regarding the signals you configured. The next image shows a monitoring dashboard that includes data drift and data quality signals. Depending on the monitoring signals you configured, your dashboard might look different.
Look in the Notifications section of the dashboard to see, for each signal, which features breached the configured threshold for their respective metrics:
Select the data_drift to go to the data drift details page. On the details page, you can see the data drift metric value for each numerical and categorical feature that you included in your monitoring configuration. When your monitor has more than one run, you'll see a trendline for each feature.
To view an individual feature in detail, select the name of the feature to view the production distribution compared to the reference distribution. This view also allows you to track drift over time for that specific feature.
Return to the monitoring dashboard and select data_quality to view the data quality signal page. On this page, you can see the null value rates, out-of-bounds rates, and data type error rates for each feature you're monitoring.

Model monitoring is a continuous process. With Azure Machine Learning model monitoring, you can configure multiple monitoring signals to obtain a broad view into the performance of your models in production.

Integrate Azure Machine Learning model monitoring with Azure Event Grid

You can use events generated by Azure Machine Learning model monitoring to set up event-driven applications, processes, or CI/CD workflows with Azure Event Grid. You can consume events through various event handlers, such as Azure Event Hubs, Azure functions, and logic apps. Based on the drift detected by your monitors, you can take action programmatically, such as by setting up a machine learning pipeline to re-train a model and re-deploy it.

To get started with integrating Azure Machine Learning model monitoring with Event Grid:

Follow the steps in see Set up in Azure portal. Give your Event Subscription a name, such as MonitoringEvent, and select only the Run status changed box under Event Types.

Warning

Be sure to select Run status changed for the event type. Don't select Dataset drift detected, as it applies to data drift v1, rather than Azure Machine Learning model monitoring.
Follow the steps in Filter & subscribe to events to set up event filtering for your scenario. Navigate to the Filters tab and add the following Key, Operator, and Value under Advanced Filters:
- Key: data.RunTags.azureml_modelmonitor_threshold_breached
- Value: has failed due to one or more features violating metric thresholds
- Operator: String contains
With this filter, events are generated when the run status changes (from Completed to Failed, or from Failed to Completed) for any monitor within your Azure Machine Learning workspace.
To filter at the monitoring level, use the following Key, Operator, and Value under Advanced Filters:
- Key: data.RunTags.azureml_modelmonitor_threshold_breached
- Value: your_monitor_name_signal_name
- Operator: String contains
Ensure that your_monitor_name_signal_name is the name of a signal in the specific monitor you want to filter events for. For example, credit_card_fraud_monitor_data_drift. For this filter to work, this string must match the name of your monitoring signal. You should name your signal with both the monitor name and the signal name for this case.
When you've completed your Event Subscription configuration, select the desired endpoint to serve as your event handler, such as Azure Event Hubs.
After events have been captured, you can view them from the endpoint page:

You can also view events in the Azure Monitor Metrics tab:

Deila með

Monitor performance of models deployed to production

Prerequisites

Set up out-of-box model monitoring

Set up advanced model monitoring

Configure feature importance

Set up model performance monitoring

More prerequisites for model performance monitoring

Monitor model performance requirements when using data collector

Example workflow for monitoring model performance

Set up model monitoring by bringing in your production data to Azure Machine Learning

Set up model monitoring with custom signals and metrics

Component input signature

Component output signature

signal_metrics schema

Interpret monitoring results

Integrate Azure Machine Learning model monitoring with Azure Event Grid

Athugasemdir

Fleiri tilföng

Deila með

Monitor performance of models deployed to production

Prerequisites

Set up out-of-box model monitoring

Set up advanced model monitoring

Configure feature importance

Set up model performance monitoring

More prerequisites for model performance monitoring

Monitor model performance requirements when using data collector

Example workflow for monitoring model performance

Set up model monitoring by bringing in your production data to Azure Machine Learning

Set up model monitoring with custom signals and metrics

Component input signature

Component output signature

signal_metrics schema

Interpret monitoring results

Integrate Azure Machine Learning model monitoring with Azure Event Grid

Related content

Athugasemdir

Fleiri tilföng