프로덕션에 배포된 모델의 성능 모니터링

적용 대상:Azure CLI ml 확장 v2(현재)Python SDK azure-ai-ml v2(현재)

Azure Machine Learning에서 모델 모니터링을 사용하여 프로덕션 환경에서 기계 학습 모델의 성능을 지속적으로 추적할 수 있습니다. 모델 모니터링은 모니터링 신호에 대한 광범위한 보기를 제공합니다. 또한 잠재적인 문제에 대해 경고합니다. 프로덕션 환경에서 모델의 신호 및 성능 메트릭을 모니터링할 때 모델의 내재된 위험을 중요하게 평가할 수 있습니다. 비즈니스에 부정적인 영향을 줄 수 있는 사각지대를 식별할 수도 있습니다.

이 문서에서는 다음 작업을 수행하는 방법을 알아봅니다.

Azure Machine Learning 온라인 엔드포인트에 배포된 모델에 대한 기본 제공 및 고급 모니터링 설정
프로덕션 중인 모델의 성능 메트릭 모니터링
Azure Machine Learning 외부에 배포되거나 Azure Machine Learning 일괄 처리 엔드포인트에 배포되는 모니터 모델
모델 모니터링에 사용할 사용자 지정 신호 및 메트릭 설정
모니터링 결과 해석
Azure Event Grid와 Azure Machine Learning 모델 모니터링 통합

필수 구성 요소

Azure CLI 및 ml Azure CLI에 대한 확장(설치 및 구성됨)입니다. 자세한 내용은 CLI 설치 및 설정(v2)을 참조하세요.
Bash 셸 또는 호환되는 셸(예: Linux 시스템의 셸 또는 Linux용 Windows 하위 시스템). 이 문서의 Azure CLI 예제에서는 이 유형의 셸을 사용한다고 가정합니다.
Azure Machine Learning 작업 영역 작업 영역을 만드는 지침은 설정(Set up)을 참조하세요.

Azure Machine Learning 작업 영역 작업 영역을 만드는 단계는 작업 영역 만들기를 참조하세요.
Python v2용 Azure Machine Learning SDK. SDK를 설치하려면 다음 명령을 사용합니다.
```
pip install azure-ai-ml azure-identity
```
기존 SDK 설치를 최신 버전으로 업데이트하려면 다음 명령을 사용합니다.
```
pip install --upgrade azure-ai-ml azure-identity
```
자세한 내용은 Python용 Azure Machine Learning 패키지 클라이언트 라이브러리를 참조하세요.

다음 Azure RBAC(Azure 역할 기반 액세스 제어) 역할 중 하나 이상이 있는 사용자 계정:
- Azure Machine Learning 작업 영역에 대한 소유자 역할
- Azure Machine Learning 작업 영역에 대한 기여자 역할
- 권한이 있는 Microsoft.MachineLearningServices/workspaces/onlineEndpoints/* 사용자 지정 역할
자세한 내용은 Azure Machine Learning 작업 영역에 대한 액세스 관리를 참조하세요.
Azure Machine Learning 관리형 온라인 엔드포인트 또는 Kubernetes 온라인 엔드포인트를 모니터링하는 경우:
- Azure Machine Learning 온라인 엔드포인트에 배포되는 모델입니다. 관리되는 온라인 엔드포인트 및 Kubernetes 온라인 엔드포인트가 지원됩니다. Azure Machine Learning 온라인 엔드포인트에 모델을 배포하는 방법에 대한 지침은 온라인 엔드포인트를 사용하여 기계 학습 모델 배포 및 점수를 매기기를 참조하세요.
- 모델 배포를 위한 데이터 수집이 활성화되었습니다. Azure Machine Learning 온라인 엔드포인트에 대한 배포 단계 중에 데이터 수집을 사용하도록 설정할 수 있습니다. 자세한 내용은 실시간 추론을 위해 배포된 모델에서 프로덕션 데이터 수집을 참조하세요.
Azure Machine Learning 일괄 처리 엔드포인트에 배포되거나 Azure Machine Learning 외부에 배포된 모델을 모니터링하는 경우:
- 프로덕션 데이터를 수집하고 Azure Machine Learning 데이터 자산으로 등록하는 수단
- 모델 모니터링을 위해 등록된 데이터 자산을 지속적으로 업데이트하는 수단
- (권장) 계보 추적을 위해 Azure Machine Learning 작업 영역에서 모델 등록

서버리스 Spark 컴퓨팅 풀 구성

모델 모니터링 작업은 서버리스 Spark 컴퓨팅 풀에서 실행되도록 예약됩니다. 지원되는 Azure Virtual Machines 인스턴스 유형은 다음과 같습니다.

Standard_E4s_v3
Standard_E8s_v3
Standard_E16s_v3
Standard_E32s_v3
Standard_E64s_v3

이 문서의 절차를 따를 때 가상 머신 인스턴스 유형을 지정하려면 다음 단계를 수행합니다.

Azure CLI를 사용하여 모니터를 만드는 경우 YAML 구성 파일을 사용합니다. 해당 파일에서 create_monitor.compute.instance_type 값을 사용하려는 형식으로 설정합니다.

기본 모델 모니터링 설정

Azure Machine Learning 온라인 엔드포인트에서 프로덕션에 모델을 배포하고 배포 시 데이터 수집 을 사용하도록 설정하는 시나리오를 고려합니다. 이 경우 Azure Machine Learning은 프로덕션 유추 데이터를 수집하고 Azure Blob Storage에 자동으로 저장합니다. Azure Machine Learning 모델 모니터링을 사용하여 이 프로덕션 유추 데이터를 지속적으로 모니터링할 수 있습니다.

모델 모니터링의 기본 설정에 Azure CLI, Python SDK 또는 스튜디오를 사용할 수 있습니다. 기본 모델 모니터링 구성은 다음과 같은 모니터링 기능을 제공합니다.

Azure Machine Learning은 Azure Machine Learning 온라인 배포와 연결된 프로덕션 유추 데이터 자산을 자동으로 검색하고 모델 모니터링에 데이터 자산을 사용합니다.
비교 참조 데이터 자산은 최근 과거 프로덕션 유추 데이터 자산으로 설정됩니다.
모니터링 설정은 데이터 드리프트, 예측 드리프트 및 데이터 품질과 같은 기본 제공 모니터링 신호를 자동으로 포함하고 추적합니다. 각 모니터링 신호에 대해 Azure Machine Learning은 다음을 사용합니다.
- 과거 프로덕션 유추 데이터 자산을 비교 참조 데이터 자산으로 사용합니다.
- 메트릭 및 임계값에 대한 스마트 기본값입니다.
모니터링 작업은 정기적인 일정에 따라 실행되도록 구성됩니다. 이 작업은 모니터링 신호를 획득하고 해당 임계값에 대해 각 메트릭 결과를 평가합니다. 기본적으로 임계값을 초과하면 Azure Machine Learning은 모니터를 설정한 사용자에게 경고 이메일을 보냅니다.

기본 모델 모니터링을 설정하려면 다음 단계를 따릅니다.

Azure CLI에서는 모니터링 작업을 예약하는 데 사용합니다 az ml schedule .

YAML 파일에서 모니터링 정의를 만듭니다. 기본 제공 샘플 정의는 azureml 예제 리포지토리에서도 사용할 수 있는 다음 YAML 코드를 참조하세요.

이 정의를 사용하기 전에 환경에 맞게 값을 조정합니다. 의 경우 endpoint_deployment_id형식 azureml:<endpoint-name>:<deployment-name>의 값을 사용합니다.

# out-of-box-monitoring.yaml
$schema:  http://azureml/sdk-2-0/Schedule.json
name: credit_default_model_monitoring
display_name: Credit default model monitoring
description: Credit default model monitoring setup with minimal configurations

trigger:
  # perform model monitoring activity daily at 3:15am
  type: recurrence
  frequency: day #can be minute, hour, day, week, month
  interval: 1 # #every day
  schedule: 
    hours: 3 # at 3am
    minutes: 15 # at 15 mins after 3am

create_monitor:

  compute: # specify a spark compute for monitoring job
    instance_type: standard_e4s_v3
    runtime_version: "3.4"

  monitoring_target: 
    ml_task: classification # model task type: [classification, regression, question_answering]
    endpoint_deployment_id: azureml:credit-default:main # azureml endpoint deployment id

  alert_notification: # emails to get alerts
    emails:
      - abc@example.com
      - def@example.com

다음 명령을 실행하여 모델을 만듭니다.

az ml schedule create -f ./out-of-box-monitoring.yaml

다음 샘플과 유사한 코드를 사용합니다. 다음 자리 표시자를 적절한 값으로 바꿉니다.

플레이스홀더	설명	예시
<subscription-ID>	구독의 ID	aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e
<리소스 그룹 이름>	작업 영역을 포함하는 리소스 그룹의 이름	my-resource-group
<작업 공간 이름>	작업 영역의 이름	my-workspace
<엔드포인트-이름>	모니터링할 엔드포인트의 이름	신용 불이행
<배포-이름>	모니터링할 배포의 이름	기본
<email-address-1> 및 <email-address-2>	알림에 사용할 전자 메일 주소	`abc@example.com`
<주파수 단위>	모니터링 빈도 단위	일
<interval>	빈도 단위로 표현된 작업 간 간격	1
<시작 시간>	24시간제로 모니터링을 시작할 시간	3
<시작 시간(분)>	모니터링을 시작하기 위해 지정된 시간 이후의 분	15

from azure.identity import DefaultAzureCredential
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    AlertNotification,
    MonitoringTarget,
    MonitorDefinition,
    MonitorSchedule,
    RecurrencePattern,
    RecurrenceTrigger,
    ServerlessSparkCompute
)

# Get a handle to the workspace.
ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="<subscription-ID>",
    resource_group_name="<resource-group-name>",
    workspace_name="<workspace-name>",
)

# Create the compute instance.
spark_compute = ServerlessSparkCompute(
    instance_type="standard_e4s_v3",
    runtime_version="3.3"
)

# Specify your online endpoint deployment.
monitoring_target = MonitoringTarget(
    ml_task="classification",
    endpoint_deployment_id="azureml:<endpoint-name>:<deployment-name>"
)

# Create an alert notification object.
alert_notification = AlertNotification(
    emails=['<email-address-1>', '<email-address-2>']
)

# Create the monitor definition.
monitor_definition = MonitorDefinition(
    compute=spark_compute,
    monitoring_target=monitoring_target,
    alert_notification=alert_notification
)

# Specify the schedule frequency.
recurrence_trigger = RecurrenceTrigger(
    frequency="<frequency-unit>",
    interval=<interval>,
    schedule=RecurrencePattern(hours=<start-hour>, minutes=<start-minutes>)
)

# Create the monitoring schedule.
model_monitor = MonitorSchedule(
    name="credit_default_monitor_basic",
    trigger=recurrence_trigger,
    create_monitor=monitor_definition
)

# Schedule the monitoring job.
poller = ml_client.schedules.begin_create_or_update(model_monitor)
created_monitor = poller.result()

고급 모델 모니터링 설정

Azure Machine Learning은 지속적인 모델 모니터링을 위해 많은 기능을 제공합니다. 이 기능의 포괄적인 목록은 모델 모니터링 기능을 참조하세요. 대부분의 경우 고급 모니터링 작업을 지원하는 모델 모니터링을 설정해야 합니다. 다음 섹션에서는 고급 모니터링의 몇 가지 예를 제공합니다.

광범위한 보기를 위해 여러 모니터링 신호 사용
기록 모델 학습 데이터 또는 유효성 검사 데이터를 비교 참조 데이터 자산으로 사용
가장 중요한 N개의 기능 및 개별 기능의 모니터링

기능 중요도 구성

기능 중요도는 모델의 출력에 대한 각 입력 기능의 상대적 중요도를 나타냅니다. 예를 들어 온도는 상승보다 모델의 예측에 더 중요할 수 있습니다. 기능 중요도를 켜면 프로덕션 환경에서 드리프트하거나 데이터 품질 문제를 겪지 않으려는 기능에 대한 가시성을 제공할 수 있습니다.

데이터 드리프트나 데이터 품질과 같은 신호에 대한 기능 중요도를 켜려면 다음을 제공해야 합니다.

사용자의 학습 데이터 자산을 reference_data 데이터 자산으로 지정합니다.
reference_data.data_column_names.target_column 모델 출력 열 또는 예측 열의 이름인 속성입니다.

기능 중요도를 켜면 Azure Machine Learning 스튜디오에서 모니터링하는 각 기능에 대한 기능 중요도가 표시됩니다.

Python SDK 또는 Azure CLI를 사용할 때 속성을 설정 alert_enabled 하여 각 신호에 대한 경고를 켜거나 끌 수 있습니다.

Azure CLI, Python SDK 또는 스튜디오를 사용하여 고급 모델 모니터링을 설정할 수 있습니다.

YAML 파일에서 모니터링 정의를 만듭니다. 샘플 고급 정의는 azureml-examples 리포지토리에서도 사용할 수 있는 다음 YAML 코드를 참조하세요.

이 정의를 사용하기 전에 환경의 요구 사항에 맞게 다음 설정 및 기타 설정을 조정합니다.

의 경우 endpoint_deployment_id형식 azureml:<endpoint-name>:<deployment-name>의 값을 사용합니다.
참조 입력 데이터 섹션의 경우 path 형식 azureml:<reference-data-asset-name>:<version>의 값을 사용합니다.
의 경우 target_column모델에서 예측하는 값이 포함된 출력 열의 이름(예: DEFAULT_NEXT_MONTH.)을 사용합니다.
features의 경우, 고급 데이터 품질 신호에서 사용하려는 기능을 나열하십시오. 예를 들어 SEX, EDUCATION, AGE과 같은 기능이 포함됩니다.
아래에서 emails알림에 사용할 전자 메일 주소를 나열합니다.

# advanced-model-monitoring.yaml
$schema:  http://azureml/sdk-2-0/Schedule.json
name: fraud_detection_model_monitoring
display_name: Fraud detection model monitoring
description: Fraud detection model monitoring with advanced configurations

trigger:
  # perform model monitoring activity daily at 3:15am
  type: recurrence
  frequency: day #can be minute, hour, day, week, month
  interval: 1 # #every day
  schedule: 
    hours: 3 # at 3am
    minutes: 15 # at 15 mins after 3am

create_monitor:

  compute: 
    instance_type: standard_e4s_v3
    runtime_version: "3.4"

  monitoring_target:
    ml_task: classification
    endpoint_deployment_id: azureml:credit-default:main
  
  monitoring_signals:
    advanced_data_drift: # monitoring signal name, any user defined name works
      type: data_drift
      # reference_dataset is optional. By default referece_dataset is the production inference data associated with Azure Machine Learning online endpoint
      reference_data:
        input_data:
          path: azureml:credit-reference:1 # use training data as comparison reference dataset
          type: mltable
        data_context: training
        data_column_names:
          target_column: DEFAULT_NEXT_MONTH
      features: 
        top_n_feature_importance: 10 # monitor drift for top 10 features
      alert_enabled: true
      metric_thresholds:
        numerical:
          jensen_shannon_distance: 0.01
        categorical:
          pearsons_chi_squared_test: 0.02
    advanced_data_quality:
      type: data_quality
      # reference_dataset is optional. By default reference_dataset is the production inference data associated with Azure Machine Learning online endpoint
      reference_data:
        input_data:
          path: azureml:credit-reference:1
          type: mltable
        data_context: training
      features: # monitor data quality for 3 individual features only
        - SEX
        - EDUCATION
      alert_enabled: true
      metric_thresholds:
        numerical:
          null_value_rate: 0.05
        categorical:
          out_of_bounds_rate: 0.03

    feature_attribution_drift_signal:
      type: feature_attribution_drift
      # production_data: is not required input here
      # Please ensure Azure Machine Learning online endpoint is enabled to collected both model_inputs and model_outputs data
      # Azure Machine Learning model monitoring will automatically join both model_inputs and model_outputs data and used it for computation
      reference_data:
        input_data:
          path: azureml:credit-reference:1
          type: mltable
        data_context: training
        data_column_names:
          target_column: DEFAULT_NEXT_MONTH
      alert_enabled: true
      metric_thresholds:
        normalized_discounted_cumulative_gain: 0.9
  
  alert_notification:
    emails:
      - abc@example.com
      - def@example.com

다음 명령을 실행하여 모델을 만듭니다.

az ml schedule create -f ./advanced-model-monitoring.yaml

고급 모델 모니터링을 설정하려면 다음 샘플과 유사한 코드를 사용합니다. 다음 자리 표시자를 적절한 값으로 바꿉니다.

플레이스홀더	설명	예시
<subscription-ID>	구독의 ID	aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e
<리소스 그룹 이름>	작업 영역을 포함하는 리소스 그룹의 이름	my-resource-group
<작업 공간 이름>	작업 영역의 이름	my-workspace
<엔드포인트-이름>	모니터링할 엔드포인트의 이름	신용 불이행
<배포-이름>	모니터링할 배포의 이름	기본
<생산 데이터 자산 이름>	프로덕션 데이터를 포함하는 데이터 자산의 이름입니다.	credit-default-main-model_inputs
<참조 데이터 자산 이름>	참조 데이터가 포함된 데이터 자산의 이름입니다.	credit-default-reference
<대상 열>:	모델이 예측하는 값을 포함하는 출력 열의 이름입니다.	DEFAULT_NEXT_MONTH
<feature-1>, <feature-2> 및 <feature-3>	고급 데이터 품질 신호에 사용하려는 기능	나이
<email-address-1> 및 <email-address-2>	알림에 사용할 전자 메일 주소	`abc@example.com`
<주파수 단위>	모니터링 빈도 단위	일
<interval>	빈도 단위로 표현된 작업 간 간격	1
<시작 시간>	24시간제로 모니터링을 시작할 시간	3
<시작 시간(분)>	모니터링을 시작하기 위해 지정된 시간 이후의 분	15

from azure.identity import DefaultAzureCredential
from azure.ai.ml import Input, MLClient
from azure.ai.ml.constants import (
    MonitorDatasetContext,
)
from azure.ai.ml.entities import (
    AlertNotification,
    BaselineDataRange,
    DataDriftSignal,
    DataQualitySignal,
    PredictionDriftSignal,
    DataDriftMetricThreshold,
    DataQualityMetricThreshold,
    FeatureAttributionDriftMetricThreshold,
    FeatureAttributionDriftSignal,
    PredictionDriftMetricThreshold,
    NumericalDriftMetrics,
    CategoricalDriftMetrics,
    DataQualityMetricsNumerical,
    DataQualityMetricsCategorical,
    MonitorFeatureFilter,
    MonitoringTarget,
    MonitorDefinition,
    MonitorSchedule,
    RecurrencePattern,
    RecurrenceTrigger,
    ServerlessSparkCompute,
    ReferenceData,
    ProductionData
)

# Get a handle to the workspace.
ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="<subscription-ID>",
    resource_group_name="<resource-group-name>",
    workspace_name="<workspace-name>",
)

# Create a compute instance.
spark_compute = ServerlessSparkCompute(
    instance_type="standard_e4s_v3",
    runtime_version="3.3"
)

# Specify the online deployment if you have one.
monitoring_target = MonitoringTarget(
    ml_task="classification",
    endpoint_deployment_id="azureml:<endpoint-name>:<deployment-name>"
)

# Specify a look-back window size and offset to use. Omit this line to use the default values, which are listed in the documentation.
data_window = BaselineDataRange(lookback_window_size="P1D", lookback_window_offset="P0D")

# Set up the production data.
production_data = ProductionData(
    input_data=Input(
        type="uri_folder",
        path="azureml:<production-data-asset-name>:1"
    ),
    data_window=data_window,
    data_context=MonitorDatasetContext.MODEL_INPUTS,
)

# Set up the training data to use as a reference data asset.
reference_data_training = ReferenceData(
    input_data=Input(
        type="mltable",
        path="azureml:<reference-data-asset-name>:1"
    ),
    data_column_names={
        "target_column":"<target-column>"
    },
    data_context=MonitorDatasetContext.TRAINING,
)

# Create an advanced data drift signal.
features = MonitorFeatureFilter(top_n_feature_importance=10)

metric_thresholds = DataDriftMetricThreshold(
    numerical=NumericalDriftMetrics(
        jensen_shannon_distance=0.01
    ),
    categorical=CategoricalDriftMetrics(
        pearsons_chi_squared_test=0.02
    )
)

advanced_data_drift = DataDriftSignal(
    reference_data=reference_data_training,
    features=features,
    metric_thresholds=metric_thresholds,
    alert_enabled=True
)

# Create an advanced prediction drift signal.
metric_thresholds = PredictionDriftMetricThreshold(
    categorical=CategoricalDriftMetrics(
        jensen_shannon_distance=0.01
    )
)

advanced_prediction_drift = PredictionDriftSignal(
    reference_data=reference_data_training,
    metric_thresholds=metric_thresholds,
    alert_enabled=True
)

# Create an advanced data quality signal.
features = ['<feature-1>', '<feature-2>', '<feature-3>']

metric_thresholds = DataQualityMetricThreshold(
    numerical=DataQualityMetricsNumerical(
        null_value_rate=0.01
    ),
    categorical=DataQualityMetricsCategorical(
        out_of_bounds_rate=0.02
    )
)

advanced_data_quality = DataQualitySignal(
    reference_data=reference_data_training,
    features=features,
    metric_thresholds=metric_thresholds,
    alert_enabled=True
)

# Create a feature attribution drift signal.
metric_thresholds = FeatureAttributionDriftMetricThreshold(normalized_discounted_cumulative_gain=0.9)

feature_attribution_drift = FeatureAttributionDriftSignal(
    reference_data=reference_data_training,
    metric_thresholds=metric_thresholds,
    alert_enabled=True
)

# Put all monitoring signals in a dictionary.
monitoring_signals = {
    'data_drift_advanced':advanced_data_drift,
    'data_quality_advanced':advanced_data_quality,
    'feature_attribution_drift':feature_attribution_drift,
}

# Create an alert notification object.
alert_notification = AlertNotification(
    emails=['<email-address-1>', '<email-address-2>']
)

# Create the monitor definition.
monitor_definition = MonitorDefinition(
    compute=spark_compute,
    monitoring_target=monitoring_target,
    monitoring_signals=monitoring_signals,
    alert_notification=alert_notification
)

# Specify the schedule frequency.
recurrence_trigger = RecurrenceTrigger(
    frequency="<frequency-unit>",
    interval=<interval>,
    schedule=RecurrencePattern(hours=<start-hour>, minutes=<start-minutes>)
)

# Create the monitoring schedule.
model_monitor = MonitorSchedule(
    name="credit_default_monitor_advanced",
    trigger=recurrence_trigger,
    create_monitor=monitor_definition
)

# Schedule the monitoring job.
poller = ml_client.schedules.begin_create_or_update(model_monitor)
created_monitor = poller.result()

모델 성능 모니터링 설정

Azure Machine Learning 모델 모니터링을 사용하는 경우 성능 메트릭을 계산하여 프로덕션 환경에서 모델의 성능을 추적할 수 있습니다. 현재 다음 모델 성능 메트릭이 지원됩니다.

분류 모델의 경우:
- 자릿수
- 정확도
- 회수
회귀 모델의 경우:
- MAE(평균 절대 오차):
- 평균 제곱 오차(MSE)
- 제곱 평균 오차의 제곱근(RMSE)

모델 성능 모니터링을 위한 필수 구성 요소

각 행에 대한 고유 ID를 사용하여 프로덕션 모델(모델의 예측)에 대한 데이터를 출력합니다. Azure Machine Learning 데이터 수집기를 사용하여 프로덕션 데이터를 수집하는 경우 각 유추 요청에 대한 상관 관계 ID가 제공됩니다. 데이터 수집기는 애플리케이션에서 고유한 ID를 로깅하는 옵션도 제공합니다.

참고

Azure Machine Learning 모델 성능 모니터링의 경우 Azure Machine Learning 데이터 수집기를 사용하여 고유한 ID를 자체 열에 기록하는 것이 좋습니다.
각 행마다 고유 ID가 있는 기준 데이터(실제 데이터). 지정된 행의 고유 ID는 해당 특정 유추 요청에 대한 모델 출력 데이터의 고유 ID와 일치해야 합니다. 이 고유 ID는 기본 진리 데이터 자산을 모델 출력 데이터와 조인하는 데 사용됩니다.

실제 데이터가 없는 경우 모델 성능 모니터링을 수행할 수 없습니다. 참조 자료 데이터는 애플리케이션 수준에서 발생하므로, 해당 데이터가 제공되면 수집하는 것은 사용자의 책임입니다. 또한 이 참조 자료 데이터가 포함된 Azure Machine Learning의 데이터 자산을 유지 관리해야 합니다.
(선택 사항) 모델 출력 데이터와 정답 데이터가 이미 조인된 테이블 형식의 미리 조인된 데이터 자산입니다.

데이터 수집기를 사용하는 경우 모델 성능 모니터링에 대한 요구 사항

Azure Machine Learning은 다음 조건을 충족하는 경우 상관 관계 ID를 생성합니다.

Azure Machine Learning 데이터 수집기를 사용하여 프로덕션 유추 데이터를 수집합니다.
각 행에 대해 별도의 열로 고유 ID를 제공하지 않습니다.

생성된 상관 관계 ID는 기록된 JSON 개체에 포함됩니다. 그러나 데이터 수집기는 서로 짧은 시간 간격 내에 전송되는 행을 일괄 처리 합니다. 일괄 처리된 행은 동일한 JSON 개체 내에 속합니다. 각 개체 내에서 모든 행의 상관 관계 ID는 동일합니다.

JSON 개체의 행을 구분하기 위해 Azure Machine Learning 모델 성능 모니터링은 인덱싱을 사용하여 개체 내의 행 순서를 결정합니다. 예를 들어 일괄 처리에 세 개의 행이 있고 상관 관계 ID가 있는 test경우 첫 번째 행의 test_0ID가 있고, 두 번째 행의 test_1ID가 있고, 세 번째 행의 ID test_2가 있습니다. 수집된 프로덕션 유추 모델 출력 데이터의 ID와 기본 진리 데이터 자산 고유 ID를 일치하려면 각 상관 관계 ID에 인덱스를 적절하게 적용합니다. 기록된 JSON 개체에 행이 하나만 있는 경우 값으로 correlationid_0 사용합니다correlationid.

이 인덱싱을 사용하지 않도록 하려면 고유한 ID를 자체 열에 기록하는 것이 좋습니다. Azure Machine Learning 데이터 수집기가 기록하는 pandas 데이터 프레임 내에 해당 열을 배치합니다. 모델 모니터링 구성에서 이 열의 이름을 지정하여 모델 출력 데이터를 기본 진리 데이터와 조인할 수 있습니다. 두 데이터 자산의 각 행에 대한 ID가 동일한 경우 Azure Machine Learning 모델 모니터링은 모델 성능 모니터링을 수행할 수 있습니다.

모델 성능 모니터링을 위한 워크플로 예

모델 성능 모니터링과 관련된 개념을 이해하려면 다음 예제 워크플로를 고려하세요. 신용 카드 거래가 사기성인지 여부를 예측하는 모델을 배포하는 시나리오에 적용됩니다.

데이터 수집기를 사용하여 모델의 프로덕션 유추 데이터(입력 및 출력 데이터)를 수집하도록 배포를 구성합니다. 출력 데이터를 라는 is_fraud열에 저장합니다.
수집된 유추 데이터의 각 행에 대해 고유 ID를 기록합니다. 고유 ID는 애플리케이션에서 가져오거나 Azure Machine Learning이 기록된 각 JSON 개체에 대해 고유하게 생성하는 값을 사용할 correlationid 수 있습니다.
기본 진리(또는 실제) is_fraud 데이터를 사용할 수 있는 경우 각 행을 로그하고 모델의 출력 데이터에 있는 해당 행에 대해 기록된 것과 동일한 고유 ID에 매핑합니다.
Azure Machine Learning에 데이터 자산을 등록하고 이를 사용하여 기본 진리 is_fraud 데이터를 수집하고 유지 관리합니다.
고유 ID 열을 사용하여 모델의 운영 추론 및 정확한 참조 데이터 자산을 연결하여 모델 성능 모니터링 신호를 만듭니다.
모델 성능 메트릭을 계산합니다.

모델 성능 모니터링의 필수 구성 요소를 충족한 후 다음 단계를 수행하여 모델 모니터링을 설정합니다.

YAML 파일에서 모니터링 정의를 만듭니다. 다음 샘플 사양은 프로덕션 유추 데이터를 사용하여 모델 모니터링을 정의합니다. 이 정의를 사용하기 전에 환경의 요구 사항에 맞게 다음 설정 및 기타 설정을 조정합니다.

의 경우 endpoint_deployment_id형식 azureml:<endpoint-name>:<deployment-name>의 값을 사용합니다.
입력 데이터 섹션의 각 path 값에 대해 형식 azureml:<data-asset-name>:<version>의 값을 사용합니다.
값의 prediction 경우 모델이 예측하는 값을 포함하는 출력 열의 이름을 사용합니다.
actual 값의 경우, 모델이 예측하려고 하는 실제 값이 포함된 정답 열의 이름을 사용하세요.
correlation_id 값에 대해서는 출력 데이터와 기준 진리 데이터를 연결하는 데 사용되는 열의 이름을 사용하십시오.
아래에서 emails알림에 사용할 전자 메일 주소를 나열합니다.

# model-performance-monitoring.yaml
$schema:  http://azureml/sdk-2-0/Schedule.json
name: model_performance_monitoring
display_name: Credit card fraud model performance
description: Credit card fraud model performance

trigger:
  type: recurrence
  frequency: day
  interval: 7 
  schedule: 
    hours: 10
    minutes: 15

create_monitor:
  compute: 
    instance_type: standard_e8s_v3
    runtime_version: "3.3"
  monitoring_target:
    ml_task: classification
    endpoint_deployment_id: azureml:loan-approval-endpoint:loan-approval-deployment

  monitoring_signals:
    fraud_detection_model_performance: 
      type: model_performance 
      production_data:
        input_data:
          path: azureml:credit-default-main-model_outputs:1
          type: mltable
        data_column_names:
          prediction: is_fraud
          correlation_id: correlation_id
      reference_data:
        input_data:
          path: azureml:my_model_ground_truth_data:1
          type: mltable
        data_column_names:
          actual: is_fraud
          correlation_id: correlation_id
        data_context: ground_truth
      alert_enabled: true
      metric_thresholds: 
        tabular_classification:
          accuracy: 0.95
          precision: 0.8
  alert_notification: 
      emails: 
        - abc@example.com

다음 명령을 실행하여 모델을 만듭니다.

az ml schedule create -f ./model-performance-monitoring.yaml

모델 성능 모니터링의 필수 구성 요소를 충족한 후 다음 Python 코드를 사용하여 모델 모니터링을 설정합니다. 먼저 다음 자리 표시자를 적절한 값으로 바꿉다.

플레이스홀더	설명	예시
<subscription-ID>	구독의 ID	aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e
<리소스 그룹 이름>	작업 영역을 포함하는 리소스 그룹의 이름	my-resource-group
<작업 공간 이름>	작업 영역의 이름	my-workspace
<생산 데이터 자산 이름>	프로덕션 데이터를 포함하는 데이터 자산의 이름입니다.	credit-default-main-model_inputs
<production-target-column>	모델이 예측하는 값을 포함하는 프로덕션 열의 이름입니다.	DEFAULT_NEXT_MONTH
<production-join-column>	프로덕션 및 참조 자료 데이터를 조인하는 데 사용할 프로덕션 열의 이름	correlationid
<ground-truth-data-asset-name>	참조 자료 데이터가 포함된 데이터 자산의 이름	credit-ground-truth
<ground-truth-target-column>	모델이 예측하려는 참조 자료 데이터를 포함하는 참조 자료 열의 이름	ground_truth
<ground-truth-join-column>	프로덕션 및 참조 자료 데이터를 조인하는 데 사용할 참조 자료 열의 이름	correlationid
<email-address-1> 및 <email-address-2>	알림에 사용할 전자 메일 주소	`abc@example.com`
<주파수 단위>	모니터링 빈도 단위	일
<interval>	빈도 단위로 표현된 작업 간 간격	1
<시작 시간>	24시간제로 모니터링을 시작할 시간	3
<시작 시간(분)>	모니터링을 시작하기 위해 지정된 시간 이후의 분	15

from azure.identity import DefaultAzureCredential
from azure.ai.ml import Input, MLClient
from azure.ai.ml.constants import (
    MonitorDatasetContext,
)
from azure.ai.ml.entities import (
    AlertNotification,
    BaselineDataRange,
    ModelPerformanceMetricThreshold,
    ModelPerformanceSignal,
    ModelPerformanceClassificationThresholds,
    MonitoringTarget,
    MonitorDefinition,
    MonitorSchedule,
    RecurrencePattern,
    RecurrenceTrigger,
    ServerlessSparkCompute,
    ReferenceData,
    ProductionData
)

# Get a handle to the workspace.
ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="<subscription-ID>",
    resource_group_name="<resource-group-name>",
    workspace_name="<workspace-name>",
)

# Create a compute instance.
spark_compute = ServerlessSparkCompute(
    instance_type="standard_e4s_v3",
    runtime_version="3.3"
)

# Specify the type of the model task.
monitoring_target = MonitoringTarget(
    ml_task="classification",
)

# Specify production data that the model data collector generates. 
production_data = ProductionData(
    input_data=Input(
        type="uri_folder",
        path="azureml:<production-data-asset-name>:1"
    ),
    data_column_names={
        "target_column": "<production-target-column>",
        "join_column": "<production-join-column>"
    },
    data_window=BaselineDataRange(
        lookback_window_offset="P0D",
        lookback_window_size="P10D",
    )
)

# Specify the ground truth reference data.
reference_data_ground_truth = ReferenceData(
    input_data=Input(
        type="mltable",
        path="azureml:<ground-truth-data-asset-name>:1"
    ),
    data_column_names={
        "target_column": "<ground-truth-target-column>",
        "join_column": "<ground-truth-join-column>"
    },
    data_context=MonitorDatasetContext.GROUND_TRUTH_DATA,
)

# Create the model performance signal.
metric_thresholds = ModelPerformanceMetricThreshold(
    classification=ModelPerformanceClassificationThresholds(
        accuracy=0.50,
        precision=0.50,
        recall=0.50
    ),
)

model_performance = ModelPerformanceSignal(
    production_data=production_data,
    reference_data=reference_data_ground_truth,
    metric_thresholds=metric_thresholds,
    alert_enabled=True
)

# Put all monitoring signals in a dictionary.
monitoring_signals = {
    'model_performance':model_performance,
}

# Create an alert notification object.
alert_notification = AlertNotification(
    emails=['<email-address-1>', '<email-address-2>']
)

# Set up the monitor definition.
monitor_definition = MonitorDefinition(
    compute=spark_compute,
    monitoring_target=monitoring_target,
    monitoring_signals=monitoring_signals,
    alert_notification=alert_notification
)

# Specify the schedule frequency.
recurrence_trigger = RecurrenceTrigger(
    frequency="<frequency-unit>",
    interval=<interval>,
    schedule=RecurrencePattern(hours=<start-hour>, minutes=<start-minutes>)
)

# Create the monitoring schedule.
model_monitor = MonitorSchedule(
    name="credit_default_model_performance",
    trigger=recurrence_trigger,
    create_monitor=monitor_definition
)

# Schedule the monitoring job.
poller = ml_client.schedules.begin_create_or_update(model_monitor)
created_monitor = poller.result()

모델 성능 모니터링을 설정하려면 다음 섹션의 단계를 수행합니다.

기본 설정을 구성하다

Azure Machine Learning 스튜디오에서 작업 영역으로 이동합니다.
관리에서 모니터링을 선택한 다음 추가를 선택합니다.
기본 설정 페이지에서 기본 모델 모니터링 설정의 앞부분에서 설명한 대로 정보를 입력합니다.

데이터 자산 추가

기본 설정 페이지에서 [다음 ]을 선택하여 고급 설정 섹션의 데이터 자산 구성 페이지를 엽니다.
추가를 선택한 다음, 기본 진리 데이터 자산으로 사용할 데이터 자산을 추가합니다. 참조 자료 데이터 자산에는 고유한 ID 열이 있어야 합니다. 또한 지상 진리 데이터 자산 및 모델 출력 데이터 자산의 고유 ID 열에 있는 값이 일치해야 합니다. 그런 다음 메트릭 계산이 발생하기 전에 이러한 데이터 자산을 함께 조인할 수 있습니다.
추가된 데이터 자산 목록에 모델 출력 데이터 자산이 표시되지 않으면 추가를 선택한 다음 추가합니다.

성능 모니터링 신호 추가

데이터 자산 구성 페이지에서 다음을 선택합니다. 모니터링 신호 선택 페이지가 열립니다. Azure Machine Learning 온라인 배포를 사용하는 경우 모니터링 신호 목록이 표시됩니다.
페이지에 표시되는 모니터링 신호를 삭제합니다. 이 섹션의 초점은 모델 성능 모니터링 신호를 만드는 것입니다.
추가를 선택합니다.
신호 편집 창에서 모델 성능(미리 보기)을 선택한 다음, 다음 단계를 수행하여 모델 성능 신호를 구성합니다.
1. 1단계:
  1. 프로덕션 데이터 자산의 경우 모델 출력 데이터 자산을 선택합니다.
  2. 예를 들어 DEFAULT_NEXT_MONTH적절한 대상 열을 선택합니다.
  3. 사용할 룩백 창 크기와 오프셋을 선택합니다.
2. 2단계:
  1. 참조 데이터 자산의 경우 참조 자료 데이터 자산을 선택합니다.
  2. 예를 들어 ground_truth대상 열을 선택합니다.
  3. 예를 들어 correlationid모델 출력 데이터 자산과 조인에 사용할 열을 선택합니다. 두 데이터 자산 모두 해당 열을 포함해야 하며 데이터 자산의 각 행에 대한 고유 ID를 포함해야 합니다.
3. 3단계에서 사용하려는 성능 메트릭을 선택하고 해당 임계값을 지정합니다.
저장을 선택합니다. 모니터링 신호 선택 페이지에서 모델 성능 신호가 표시됩니다.

구성 완료

모니터링 신호 선택 페이지에서 다음을 선택합니다.
알림 페이지에서 모델 성능 신호에 대한 알림을 켜고 다음을 선택합니다.
모니터링 설정 검토 페이지에서 설정을 검토합니다.
모델 성능 모니터를 만들려면 만들기를 선택합니다.

프로덕션 데이터의 모델 모니터링 설정

Azure Machine Learning 일괄 처리 엔드포인트에 배포하거나 Azure Machine Learning 외부에서 배포하는 모델을 모니터링할 수도 있습니다. 배포가 없지만 프로덕션 데이터가 있는 경우 데이터를 사용하여 연속 모델 모니터링을 수행할 수 있습니다. 이러한 모델을 모니터링하려면 다음을 수행할 수 있어야 합니다.

프로덕션에 배포된 모델에서 프로덕션 유추 데이터를 수집합니다.
프로덕션 유추 데이터를 Azure Machine Learning 데이터 자산으로 등록하고 데이터의 지속적인 업데이트를 보장합니다.
데이터 수집기를 사용하여 데이터를 수집하지 않는 경우 사용자 지정 데이터 전처리 구성 요소를 제공하고 Azure Machine Learning 구성 요소로 등록합니다. 이 사용자 지정 데이터 전처리 구성 요소가 없으면 Azure Machine Learning 모델 모니터링 시스템은 시간 창을 지원하는 테이블 형식으로 데이터를 처리할 수 없습니다.

사용자 지정 전처리 구성 요소에는 다음과 같은 입력 및 출력 서명이 있어야 합니다.

입력 또는 출력	서명 이름	유형	설명	예제 값
입력	`data_window_start`	리터럴, 문자열	데이터 창 시작 시간(ISO8601 형식)	2023-05-01T04:31:57.012Z
입력	`data_window_end`	리터럴, 문자열	ISO8601 형식의 데이터 창 종료 시간	2023-05-01T04:31:57.012Z
입력	`input_data`	uri_folder	Azure Machine Learning 데이터 자산으로 등록된 수집된 프로덕션 유추 데이터	azureml:myproduction_inference_data:1
출력	`preprocessed_data`	엠엘테이블	참조 데이터 스키마의 하위 집합과 일치하는 테이블 형식 데이터 자산

사용자 지정 데이터 전처리 구성 요소의 예제는 azuremml-examples GitHub 리포지토리의 custom_preprocessing을 참조하세요.

Azure Machine Learning 구성 요소를 등록하는 방법에 대한 지침은 작업 영역에서 구성 요소 등록을 참조하세요.

프로덕션 데이터 및 전처리 구성 요소를 등록한 후 모델 모니터링을 설정할 수 있습니다.

다음 파일과 유사한 모니터링 정의 YAML 파일을 만듭니다. 이 정의를 사용하기 전에 환경의 요구 사항에 맞게 다음 설정 및 기타 설정을 조정합니다.

의 경우 endpoint_deployment_id형식 azureml:<endpoint-name>:<deployment-name>의 값을 사용합니다.
의 경우 pre_processing_component형식 azureml:<component-name>:<component-version>의 값을 사용합니다. 1.0.0와 같은 정확한 버전을 지정하고 1는 사용하지 마십시오.
각각 path에 대해 형식 azureml:<data-asset-name>:<version>의 값을 사용합니다.
값의 target_column 경우 모델이 예측하는 값을 포함하는 출력 열의 이름을 사용합니다.
아래에서 emails알림에 사용할 전자 메일 주소를 나열합니다.

# model-monitoring-with-collected-data.yaml
$schema:  http://azureml/sdk-2-0/Schedule.json
name: fraud_detection_model_monitoring
display_name: Fraud detection model monitoring
description: Fraud detection model monitoring with your own production data

trigger:
  # perform model monitoring activity daily at 3:15am
  type: recurrence
  frequency: day #can be minute, hour, day, week, month
  interval: 1 # #every day
  schedule: 
    hours: 3 # at 3am
    minutes: 15 # at 15 mins after 3am

create_monitor:
  compute: 
    instance_type: standard_e4s_v3
    runtime_version: "3.4"
  monitoring_target:
    ml_task: classification
    endpoint_deployment_id: azureml:fraud-detection-endpoint:fraud-detection-deployment
  
  monitoring_signals:

    advanced_data_drift: # monitoring signal name, any user defined name works
      type: data_drift
      # define production dataset with your collected data
      production_data:
        input_data:
          path: azureml:my_production_inference_data_model_inputs:1  # your collected data is registered as Azure Machine Learning asset
          type: uri_folder
        data_context: model_inputs
        pre_processing_component: azureml:production_data_preprocessing:1.0.0
      reference_data:
        input_data:
          path: azureml:my_model_training_data:1 # use training data as comparison baseline
          type: mltable
        data_context: training
        data_column_names:
          target_column: is_fraud
      features: 
        top_n_feature_importance: 20 # monitor drift for top 20 features
      alert_enabled: true
      metric_thresholds:
        numerical:
          jensen_shannon_distance: 0.01
        categorical:
          pearsons_chi_squared_test: 0.02

    advanced_prediction_drift: # monitoring signal name, any user defined name works
      type: prediction_drift
      # define production dataset with your collected data
      production_data:
        input_data:
          path: azureml:my_production_inference_data_model_outputs:1  # your collected data is registered as Azure Machine Learning asset
          type: uri_folder
        data_context: model_outputs
        pre_processing_component: azureml:production_data_preprocessing:1.0.0
      reference_data:
        input_data:
          path: azureml:my_model_validation_data:1 # use training data as comparison reference dataset
          type: mltable
        data_context: validation
      alert_enabled: true
      metric_thresholds:
        categorical:
          pearsons_chi_squared_test: 0.02
  
  alert_notification:
    emails:
      - abc@example.com
      - def@example.com

다음 명령을 실행하여 모델을 만듭니다.

az ml schedule create -f ./model-monitoring-with-collected-data.yaml

다음 Python 코드와 유사한 스크립트를 사용하여 모델 모니터링을 설정합니다. 먼저 다음 자리 표시자를 적절한 값으로 바꿉다.

플레이스홀더	설명	예시
<구독-ID\>	구독의 ID	aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e
<리소스 그룹 이름\>	작업 영역을 포함하는 리소스 그룹의 이름	my-resource-group
<작업 공간 이름\>	작업 영역의 이름	my-workspace
<생산-데이터-자산-이름\>	프로덕션 데이터를 포함하는 데이터 자산의 이름입니다.	my_model_production_data
<사전 처리 구성 요소 이름\>	전처리 구성 요소의 이름	생산 데이터 전처리
<훈련 데이터 자산 이름\>	참조 데이터 자산으로 사용하려는 학습 데이터 자산의 이름	my_model_training_data
<email-address-1\> 및 <email-address-2\>	알림에 사용할 전자 메일 주소	`abc@example.com`
<주파수 단위\>	모니터링 빈도 단위	일
<간격\>	빈도 단위로 표현된 작업 간 간격	1
<시작 시간\>	24시간제로 모니터링을 시작할 시간	3
<시작 분>	모니터링을 시작하기 위해 지정된 시간 이후의 분	15

from azure.identity import InteractiveBrowserCredential
from azure.ai.ml import Input, MLClient
from azure.ai.ml.constants import (
    MonitorFeatureType,
    MonitorMetricName,
    MonitorDatasetContext
)
from azure.ai.ml.entities import (
    AlertNotification,
    DataDriftSignal,
    DataQualitySignal,
    DataDriftMetricThreshold,
    DataQualityMetricThreshold,
    NumericalDriftMetrics,
    CategoricalDriftMetrics,
    DataQualityMetricsNumerical,
    DataQualityMetricsCategorical,
    MonitorFeatureFilter,
    MonitorInputData,
    MonitoringTarget,
    MonitorDefinition,
    MonitorSchedule,
    RecurrencePattern,
    RecurrenceTrigger,
    ServerlessSparkCompute,
    ReferenceData,
    ProductionData
)

# Get a handle to the workspace.
subscription_id = "<subscription-ID>"
resource_group = "<resource-group-name>"
workspace = "<workspace-name>"
ml_client = MLClient(
   InteractiveBrowserCredential(),
   subscription_id,
   resource_group,
   workspace
)

# Specify the compute instance.
spark_compute = ServerlessSparkCompute(
    instance_type="standard_e4s_v3",
    runtime_version="3.3"
)

# Specify the target data asset (the production data asset).
production_data = ProductionData(
    input_data=Input(
        type="uri_folder",
        path="azureml:<production-data-asset-name>:1"
    ),
    data_context=MonitorDatasetContext.MODEL_INPUTS,
    pre_processing_component="azureml:<preprocessing-component-name>:1.0.0"
)

# Specify the training data to use as a reference data asset.
reference_data_training = ReferenceData(
    input_data=Input(
        type="mltable",
        path="azureml:<training-data-asset-name>:1"
    ),
    data_context=MonitorDatasetContext.TRAINING
)

# Create an advanced data drift signal.
features = MonitorFeatureFilter(top_n_feature_importance=20)
metric_thresholds = DataDriftMetricThreshold(
    numerical=NumericalDriftMetrics(
        jensen_shannon_distance=0.01
    ),
    categorical=CategoricalDriftMetrics(
        pearsons_chi_squared_test=0.02
    )
)

advanced_data_drift = DataDriftSignal(
    production_data=production_data,
    reference_data=reference_data_training,
    features=features,
    metric_thresholds=metric_thresholds,
    alert_enabled=True
)

# Create an advanced data quality signal.
features = ['feature_A', 'feature_B', 'feature_C']
metric_thresholds = DataQualityMetricThreshold(
    numerical=DataQualityMetricsNumerical(
        null_value_rate=0.01
    ),
    categorical=DataQualityMetricsCategorical(
        out_of_bounds_rate=0.02
    )
)

advanced_data_quality = DataQualitySignal(
    production_data=production_data,
    reference_data=reference_data_training,
    features=features,
    metric_thresholds=metric_thresholds,
    alert_enabled=True
)

# Put all monitoring signals in a dictionary.
monitoring_signals = {
    'data_drift_advanced': advanced_data_drift,
    'data_quality_advanced': advanced_data_quality
}

# Create an alert notification object.
alert_notification = AlertNotification(
    emails=['<email-address-1>', '<email-address-2>']
)

# Set up the monitor definition.
monitor_definition = MonitorDefinition(
    compute=spark_compute,
    monitoring_signals=monitoring_signals,
    alert_notification=alert_notification
)

# Specify the schedule frequency.
recurrence_trigger = RecurrenceTrigger(
    frequency="<frequency-unit>",
    interval=<interval>,
    schedule=RecurrencePattern(hours=<start-hour>, minutes=<start-minutes>)
)

# Create the monitoring schedule.
model_monitor = MonitorSchedule(
    name="fraud_detection_model_monitoring_advanced",
    trigger=recurrence_trigger,
    create_monitor=monitor_definition
)

# Schedule the monitoring job.
poller = ml_client.schedules.begin_create_or_update(model_monitor)
created_monitor = poller.result()

사용자 지정 신호 및 메트릭을 사용하여 모델 모니터링 설정

Azure Machine Learning 모델 모니터링을 사용하는 경우 사용자 지정 신호를 정의하고 선택한 메트릭을 구현하여 모델을 모니터링할 수 있습니다. 사용자 지정 신호를 Azure Machine Learning 구성 요소로 등록할 수 있습니다. 모델 모니터링 작업이 지정된 일정에 따라 실행되면 데이터 드리프트, 예측 드리프트 및 데이터 품질 미리 빌드된 신호와 마찬가지로 사용자 지정 신호 내에 정의된 메트릭을 계산합니다.

모델 모니터링에 사용할 사용자 지정 신호를 설정하려면 먼저 사용자 지정 신호를 정의하고 Azure Machine Learning 구성 요소로 등록해야 합니다. Azure Machine Learning 구성 요소에는 다음과 같은 입력 및 출력 서명이 있어야 합니다.

구성 요소 입력 서명

구성 요소 입력 데이터 프레임에는 다음 항목이 포함되어야 합니다.

mltable 전처리 구성 요소의 처리된 데이터를 포함하는 구조체입니다.
사용자 지정 신호 구성 요소의 일부로 구현된 메트릭을 각각 나타내는 리터럴 수 예를 들어 std_deviation 메트릭을 구현하려면 std_deviation_threshold 입력이 필요합니다. 일반적으로 메트릭당 이름을 <metric-name>_threshold 가진 입력이 하나 있어야 합니다.

서명 이름	유형	설명	예제 값
`production_data`	엠엘테이블	참조 데이터 스키마의 하위 집합과 일치하는 테이블 형식 데이터 자산
`std_deviation_threshold`	리터럴, 문자열	구현된 메트릭에 대한 각 임계값	2

구성 요소 출력 서명

구성 요소 출력 포트에는 다음 서명이 있어야 합니다.

서명 이름	유형	설명
`signal_metrics`	엠엘테이블	계산된 메트릭을 포함하는 mltable 구조체입니다. 이 서명의 스키마에 대한 정보를 보려면 다음 섹션인 signal_metrics 스키마를 참조하세요.

signal_metrics 스키마

구성 요소 출력 데이터 프레임에는 4개의 group열( , metric_name, metric_value및 threshold_value)이 포함되어야 합니다.

서명 이름	유형	설명	예제 값
`group`	리터럴, 문자열	사용자 지정 메트릭에 적용할 최상위 논리 그룹화	거래금액
`metric_name`	리터럴, 문자열	사용자 지정 메트릭의 이름	std_deviation
`metric_value`	숫자	사용자 지정 메트릭의 값	44,896.082
`threshold_value`	숫자	사용자 지정 메트릭에 대한 임계값	2

다음 표는 std_deviation 메트릭을 계산하는 사용자 지정 신호 구성 요소의 예제 출력을 보여줍니다.

모임	metric_value	metric_name	임계값
거래금액	44,896.082	std_deviation	2
LOCALHOUR	3.983	std_deviation	2
거래금액USD	54,004.902	std_deviation	2
디지털 항목 수량	7.238	std_deviation	2
PHYSICALITEMCOUNT	5.509	std_deviation	2

사용자 지정 신호 구성 요소 정의 및 메트릭 계산 코드의 예를 보려면 azureml-examples 리포지토리의 custom_signal 참조하세요.

Azure Machine Learning 구성 요소를 등록하는 방법에 대한 지침은 작업 영역에서 구성 요소 등록을 참조하세요.

Azure Machine Learning에서 사용자 지정 신호 구성 요소를 만들고 등록한 후 다음 단계를 수행하여 모델 모니터링을 설정합니다.

YAML 파일에서 다음 파일과 유사한 모니터링 정의를 만듭니다. 이 정의를 사용하기 전에 환경의 요구 사항에 맞게 다음 설정 및 기타 설정을 조정합니다.

의 경우 component_id형식 azureml:<custom-signal-name>:1.0.0의 값을 사용합니다.
입력 데이터 섹션의 경우 path 형식 azureml:<production-data-asset-name>:<version>의 값을 사용합니다.
의 경우 pre_processing_component:
- 데이터 수집기를 사용하여 데이터를 수집하는 경우 속성을 생략할 pre_processing_component 수 있습니다.
- 데이터 수집기를 사용하지 않고 구성 요소를 사용하여 프로덕션 데이터를 전처리하려면 형식 azureml:<custom-preprocessor-name>:<custom-preprocessor-version>의 값을 사용합니다.
아래에서 emails알림에 사용할 전자 메일 주소를 나열합니다.

# custom-monitoring.yaml
$schema:  http://azureml/sdk-2-0/Schedule.json
name: my-custom-signal
trigger:
  type: recurrence
  frequency: day # Possible frequency values include "minute," "hour," "day," "week," and "month."
  interval: 7 # Monitoring runs every day when you use the value 1.
create_monitor:
  compute:
    instance_type: "standard_e4s_v3"
    runtime_version: "3.3"
  monitoring_signals:
    customSignal:
      type: custom
      component_id: azureml:my_custom_signal:1.0.0
      input_data:
        production_data:
          input_data:
            type: uri_folder
            path: azureml:my_production_data:1
          data_context: test
          data_window:
            lookback_window_size: P30D
            lookback_window_offset: P7D
          pre_processing_component: azureml:custom_preprocessor:1.0.0
      metric_thresholds:
        - metric_name: std_deviation
          threshold: 2
  alert_notification:
    emails:
      - abc@example.com

다음 명령을 실행하여 모델을 만듭니다.
```
az ml schedule create -f ./custom-monitoring.yaml
```

모니터링 결과 해석

모델 모니터를 구성하고 첫 번째 실행이 완료되면 Azure Machine Learning 스튜디오에서 결과를 볼 수 있습니다.

스튜디오의 관리 아래에서 모니터링을 선택합니다. 모니터링 페이지에서 모델 모니터의 이름을 선택하여 개요 페이지를 확인합니다. 이 페이지에는 모니터링 모델, 엔드포인트 및 배포가 표시됩니다. 또한 구성된 신호에 대한 자세한 정보도 제공합니다. 다음 이미지는 데이터 드리프트 및 데이터 품질 신호를 포함하는 모니터링 개요 페이지를 보여 줍니다.
개요 페이지의 알림 섹션을 살펴봅 니다 . 이 섹션에서는 해당 메트릭에 대해 구성된 임계값을 위반하는 각 신호의 기능을 볼 수 있습니다.
신호 섹션에서 data_drift 선택하여 데이터 드리프트 신호에 대한 자세한 정보를 확인합니다. 세부 정보 페이지에서 모니터링 구성에 포함된 각 숫자 및 범주 기능에 대한 데이터 드리프트 메트릭 값을 볼 수 있습니다. 모니터에 둘 이상의 실행이 있는 경우 각 기능에 대한 추세선이 표시됩니다.
세부 정보 페이지에서 개별 기능의 이름을 선택합니다. 참조 배포와 비교하여 프로덕션 배포를 보여 주는 자세한 보기가 열립니다. 이 보기를 사용하여 기능에 대한 시간에 따른 드리프트를 추적할 수도 있습니다.
모니터링 개요 페이지로 돌아갑니다. 신호 섹션에서 data_quality 선택하여 이 신호에 대한 자세한 정보를 봅니다. 이 페이지에서 모니터링하는 각 기능에 대한 null 값 속도, 범위를 벗어난 속도 및 데이터 형식 오류 비율을 볼 수 있습니다.

모델 모니터링은 연속 프로세스입니다. Azure Machine Learning 모델 모니터링을 사용하는 경우 여러 모니터링 신호를 구성하여 프로덕션 환경에서 모델의 성능을 광범위하게 볼 수 있습니다.

Event Grid와 Azure Machine Learning 모델 모니터링 통합

Event Grid를 사용하는 경우 Azure Machine Learning 모델 모니터링에서 생성되는 이벤트를 구성하여 애플리케이션, 프로세스 및 CI/CD 워크플로를 트리거할 수 있습니다. Azure Event Hubs, Azure Functions 및 Azure Logic Apps와 같은 다양한 이벤트 처리기를 통해 이벤트를 사용할 수 있습니다. 모니터가 드리프트를 감지하면 기계 학습 파이프라인을 실행하여 모델을 다시 학습하고 다시 배포하는 등의 작업을 프로그래밍 방식으로 수행할 수 있습니다.

Azure Machine Learning 모델 모니터링을 Event Grid와 통합하려면 다음 섹션의 단계를 수행합니다.

시스템 토픽 만들기

모니터링에 사용할 Event Grid 시스템 토픽이 없는 경우 하나를 만듭니다. 지침은 Azure Portal에서 Event Grid 시스템 항목 만들기, 보기 및 관리를 참조하세요.

이벤트 구독 만들기

Azure Portal에서 Azure Machine Learning 작업 영역으로 이동합니다.
이벤트를 선택한 다음 이벤트 구독을 선택합니다.
이름 옆에 MonitoringEvent와 같은 이벤트 구독의 이름을 입력합니다.
이벤트 유형에서 실행 상태만 변경됨을 선택합니다.

경고

이벤트 유형에 대해 변경된 실행 상태 만 선택합니다. Azure Machine Learning 모델 모니터링이 아닌 데이터 드리프트 v1에 적용되는 데이터 세트 드리프트 감지됨을 선택하지 마세요.
필터 탭 을 선택합니다. 고급 필터에서 새 필터 추가를 선택한 다음, 다음 값을 입력합니다.
- 키 아래에 data.RunTags.azureml_modelmonitor_threshold_breached을 입력합니다.
- 연산자에서 문자열 포함을 선택합니다.
- 값 아래에서 메트릭 임계값을 위반하는 하나 이상의 기능으로 인해 Enter를 입력하지 못했습니다.
이 필터를 사용하면 Azure Machine Learning 작업 영역에서 모니터의 실행 상태가 변경될 때 이벤트가 생성됩니다. 실행 상태가 완료됨에서 실패로 변경되거나 완료되지 않은 상태로 변경될 수 있습니다.

모니터링 수준에서 필터링하려면 새 필터 추가 를 다시 선택하고 다음 값을 입력합니다.
- 키 아래에 data.RunTags.azureml_modelmonitor_threshold_breached을 입력합니다.
- 연산자에서 문자열 포함을 선택합니다.
- 값 아래에서 이벤트를 필터링하려는 모니터 신호의 이름(예: credit_card_fraud_monitor_data_drift)을 입력합니다. 입력한 이름은 모니터링 신호의 이름과 일치해야 합니다. 필터링에 사용하는 모든 신호에는 모니터 이름과 신호에 대한 설명이 포함된 형식 <monitor-name>_<signal-description> 의 이름이 있어야 합니다.
기본 사항 탭을 선택합니다. Event Hubs와 같은 이벤트 처리기로 사용할 엔드포인트를 구성합니다.
만들기를 선택하여 이벤트 구독을 만듭니다.

이벤트 보기

이벤트를 캡처한 후 이벤트 처리기 엔드포인트 페이지에서 볼 수 있습니다.

Azure Monitor 메트릭 탭에서 이벤트를 볼 수도 있습니다.

피드백

이 페이지가 도움이 되었나요?

Last updated on 2025-05-02

다음을 통해 공유

기본 설정을 구성하다

데이터 자산 추가

데이터 드리프트 설정 편집

기능 특성 드리프트 신호 추가

구성 완료

기본 설정을 구성하다

데이터 자산 추가

성능 모니터링 신호 추가

구성 완료

다음을 통해 공유

프로덕션에 배포된 모델의 성능 모니터링

필수 구성 요소

서버리스 Spark 컴퓨팅 풀 구성

기본 모델 모니터링 설정

고급 모델 모니터링 설정

기능 중요도 구성

모델 성능 모니터링 설정

모델 성능 모니터링을 위한 필수 구성 요소

데이터 수집기를 사용하는 경우 모델 성능 모니터링에 대한 요구 사항

모델 성능 모니터링을 위한 워크플로 예

프로덕션 데이터의 모델 모니터링 설정

사용자 지정 신호 및 메트릭을 사용하여 모델 모니터링 설정

구성 요소 입력 서명

구성 요소 출력 서명

signal_metrics 스키마

모니터링 결과 해석

Event Grid와 Azure Machine Learning 모델 모니터링 통합

시스템 토픽 만들기

이벤트 구독 만들기

이벤트 보기

관련 콘텐츠

피드백

추가 리소스