Data collection from models in production

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

In this article, you'll learn about data collection from models that are deployed to Azure Machine Learning online endpoints.

Azure Machine Learning Data collector provides real-time logging of input and output data from models that are deployed to managed online endpoints or Kubernetes online endpoints. Azure Machine Learning stores the logged inference data in Azure blob storage. This data can then be seamlessly used for model monitoring, debugging, or auditing, thereby, providing observability into the performance of your deployed models.

Data collector provides:

  • Logging of inference data to a central location (Azure Blob Storage)
  • Support for managed online endpoints and Kubernetes online endpoints
  • Definition at the deployment level, allowing maximum changes to its configuration
  • Support for both payload and custom logging

Logging modes

Data collector provides two logging modes: payload logging and custom logging. Payload logging allows you to collect the HTTP request and response payload data from your deployed models. With custom logging, Azure Machine Learning provides you with a Python SDK for logging pandas DataFrames directly from your scoring script. Using the custom logging Python SDK, you can log model input and output data, in addition to data before, during, and after any data transformations (or preprocessing).

Data collector configuration

Data collector can be configured at the deployment level, and the configuration is specified at deployment time. You can configure the Azure Blob storage destination that will receive the collected data. You can also configure the sampling rate (ranging from 0 – 100%) of the data to collect.

Limitations

Data collector has the following limitations:

  • Data collector only supports logging for online (or real-time) Azure Machine Learning endpoints (Managed or Kubernetes).
  • The Data collector Python SDK only supports logging tabular data via pandas DataFrames.

Next steps