將記錄從 SDK v1 遷移至 SDK v2

發行項
01/16/2024

無論您透過 Azure 機器學習 Python SDK、Azure 機器學習 CLI 或 Azure Machine Learning 工作室建立實驗，Azure 機器學習都會使用 MLflow 追蹤來記錄計量和成品記憶體。我們建議使用 MLflow 來追蹤實驗。

如果您要從 SDK v1 移轉至 SDK v2，請使用本節中的資訊來瞭解 SDK v1 記錄 API 的 MLflow 對等專案。

為什麼 MLflow？

MLflow，每月下載超過 1300 萬次，已成為端對端 MLOps 的標準平臺，讓所有大小的小組能夠追蹤、共用、封裝及部署任何模型以進行批次或即時推斷。 Azure 機器學習與 MLflow 整合，可讓您的訓練程式碼實現真正的可移植性和與其他平臺的無縫整合，因為它不會保存任何 Azure 機器學習特定指示。

準備移轉至 MLflow

若要使用 MLflow 追蹤，您必須安裝適用於 MLflow azureml-mlflow的 Mlflow SDK 套件mlflow和 Azure 機器學習外掛程式。所有 Azure 機器學習環境都有這些套件可供您使用，但建立自己的環境時，您必須包含這些套件。

pip install mlflow azureml-mlflow

連線到您的工作區

Azure 機器學習可讓使用者在工作區上執行的訓練作業或遠端執行中執行追蹤（追蹤在 Azure 外部執行的實驗機器學習）。如果執行遠程追蹤，您必須指出您想要將 MLflow 連線到的工作區。

Azure 機器學習計算
遠程計算

在 Azure 機器學習計算上執行時，您已連線到工作區。

設定追蹤 URI

取得工作區的追蹤 URI：
- Azure CLI
- Python
- 演播室
- 手動
適用於：Azure CLI ml 擴充功能 v2（目前）
1. 登入並設定您的工作區：
```
az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location> 
```
2. 您可以使用命令取得追蹤 URI az ml workspace ：
```
az ml workspace show --query mlflow_tracking_uri
```
適用於：Python SDK azure-ai-ml v2 （目前）

您可以使用適用於 Python 的 Azure 機器學習 SDK v2 來取得 Azure ML MLflow 追蹤 URI。請確定您已在所使用的計算中安裝連結庫 azure-ai-ml 。下列範例會取得與您的工作區相關聯的唯一 MLFLow 追蹤 URI。
1. 使用 MLClient登入您的工作區。使用工作區組態檔可以更輕鬆地執行此動作：
```
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())
```
  提示
  
  您可以透過下列方式下載工作區組態檔：
  
  流覽至 Azure ML Studio
  
  按兩下頁面右上角 -> 下載設定檔。
  
  將檔案 config.json 儲存在您正在使用的相同目錄中。
2. 或者，您可以使用訂用帳戶標識碼、資源組名和工作區名稱來取得它：
```
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

#Enter details of your AzureML workspace
subscription_id = '<SUBSCRIPTION_ID>'
resource_group = '<RESOURCE_GROUP>'
workspace_name = '<WORKSPACE_NAME>'

ml_client = MLClient(credential=DefaultAzureCredential(),
                        subscription_id=subscription_id, 
                        resource_group_name=resource_group,
                        workspace_name=workspace_name)
```
  重要
  
  DefaultAzureCredential 會嘗試從可用的內容提取認證。如果您想要以不同的方式指定認證，例如以互動式方式使用網頁瀏覽器，您可以使用 InteractiveBrowserCredential 或封裝中 azure.identity 可用的任何其他方法。
3. 取得 Azure 機器學習追蹤 URI：
```
mlflow_tracking_uri = ml_client.workspaces.get(ml_client.workspace_name).mlflow_tracking_uri
```
使用 Azure 機器學習入口網站來取得追蹤 URI：
1. 開啟 Azure Machine Learning 工作室入口網站，並使用您的認證登入。
2. 在右上角，按兩下工作區的名稱以顯示 [目錄 + 訂用帳戶 + 工作區 ] 刀鋒視窗。
3. 按兩下 [檢視 Azure 入口網站中的所有屬性]。
4. 在 [ 基本資訊] 區段上，您會找到屬性 MLflow 追蹤 URI。
您可以使用訂用帳戶標識碼、部署資源的區域、資源組名和工作區名稱來建構 Azure 機器學習追蹤 URI。下列程式代碼範例示範如何：

警告

如果您在已啟用私人連結的工作區中工作，MLflow 端點也會使用私人連結來與 Azure 機器學習通訊。因此，追蹤 URI 會如下所示。在這些情況下，您必須使用 Azure ML SDK 或 CLI v2 取得追蹤 URI。
```
region = "<LOCATION>"
subscription_id = '<SUBSCRIPTION_ID>'
resource_group = '<RESOURCE_GROUP>'
workspace_name = '<AML_WORKSPACE_NAME>'

mlflow_tracking_uri = f"azureml://{region}.api.azureml.ms/mlflow/v1.0/subscriptions/{subscription_id}/resourceGroups/{resource_group}/providers/Microsoft.MachineLearningServices/workspaces/{workspace_name}"
```
設定追蹤 URI：
- 使用 MLflow SDK
- 使用環境變數
然後，方法 set_tracking_uri() 會將 MLflow 追蹤 URI 指向該 URI。
```
import mlflow

mlflow.set_tracking_uri(mlflow_tracking_uri)
```
您可以設定計算中的 MLflow 環境變數MLFLOW_TRACKING_URI，讓該計算中與 MLflow 的任何互動預設指向 Azure 機器學習。
```
MLFLOW_TRACKING_URI=$(az ml workspace show --query mlflow_tracking_uri | sed 's/"//g') 
```
提示

使用共享環境時，例如 Azure Databricks 叢集、Azure Synapse Analytics 叢集或類似的叢集，最好在叢集層級設定環境變數MLFLOW_TRACKING_URI，以自動設定 MLflow 追蹤 URI，以指向叢集中執行的所有會話，而不是針對每個會話執行的所有會話指向 Azure 機器學習。

設定驗證

設定追蹤之後，您也必須設定驗證對相關聯工作區的發生方式。根據預設，MLflow 的 Azure 機器學習外掛程式會開啟預設瀏覽器來提示認證來執行互動式驗證。請參閱設定適用於 Azure 的 MLflow 機器學習：設定驗證，以取得在 Azure 機器學習工作區中設定 MLflow 驗證的更多方式。

對於有用戶連線到會話的互動式作業，您可以依賴互動式驗證，因此不需要採取進一步的動作。

警告

當提示輸入認證時，互動式瀏覽器 驗證會封鎖程式代碼執行。它不適合在自動環境中進行驗證，例如訓練作業。建議您設定其他驗證模式。

針對需要自動執行的情況，您必須設定服務主體以與 Azure 機器學習通訊。

MLflow SDK
使用環境變數

import os

os.environ["AZURE_TENANT_ID"] = "<AZURE_TENANT_ID>"
os.environ["AZURE_CLIENT_ID"] = "<AZURE_CLIENT_ID>"
os.environ["AZURE_CLIENT_SECRET"] = "<AZURE_CLIENT_SECRET>"

export AZURE_TENANT_ID="<AZURE_TENANT_ID>"
export AZURE_CLIENT_ID="<AZURE_CLIENT_ID>"
export AZURE_CLIENT_SECRET="<AZURE_CLIENT_SECRET>"

提示

在處理共用環境時，建議您在計算中設定這些環境變數。最佳做法是盡可能在 Azure 實例中將其管理為秘密金鑰保存庫。例如，在 Azure Databricks 中，您可以在環境變數中使用秘密，如叢集組態所示： AZURE_CLIENT_SECRET={{secrets/<scope-name>/<secret-name>}}。如需如何在 Azure Databricks 中執行秘密，或參考您平臺中的類似檔，請參閱參考環境變數中的秘密。

實驗和執行

SDK v1

from azureml.core import Experiment

# create an Azure Machine Learning experiment and start a run
experiment = Experiment(ws, "create-experiment-sdk-v1")
azureml_run = experiment.start_logging()

SDK v2 與 MLflow

# Set the MLflow experiment and start a run
mlflow.set_experiment("logging-with-mlflow")
mlflow_run = mlflow.start_run()

記錄 API 比較

記錄整數或浮點數計量

SDK v1

azureml_run.log("sample_int_metric", 1)

SDK v2 與 MLflow

mlflow.log_metric("sample_int_metric", 1)

記錄布爾計量

SDK v1

azureml_run.log("sample_boolean_metric", True)

SDK v2 與 MLflow

mlflow.log_metric("sample_boolean_metric", 1)

記錄字串計量

SDK v1

azureml_run.log("sample_string_metric", "a_metric")

SDK v2 與 MLflow

mlflow.log_text("sample_string_text", "string.txt")

字串會記錄為成品，而不是計量。在 Azure Machine Learning 工作室中，值會顯示在 [輸出 + 記錄] 索引標籤中。

將映像記錄至 PNG 或 JPEG 檔案

SDK v1

azureml_run.log_image("sample_image", path="Azure.png")

SDK v2 與 MLflow

mlflow.log_artifact("Azure.png")

映射會記錄為成品，並出現在 Azure 機器學習 Studio 的 [映像] 索引標籤中。

記錄 matplotlib.pyplot

SDK v1

import matplotlib.pyplot as plt

plt.plot([1, 2, 3])
azureml_run.log_image("sample_pyplot", plot=plt)

SDK v2 與 MLflow

import matplotlib.pyplot as plt

plt.plot([1, 2, 3])
fig, ax = plt.subplots()
ax.plot([0, 1], [2, 3])
mlflow.log_figure(fig, "sample_pyplot.png")

映射會記錄為成品，並出現在 Azure 機器學習 Studio 的 [映像] 索引標籤中。

記錄計量清單

SDK v1

list_to_log = [1, 2, 3, 2, 1, 2, 3, 2, 1]
azureml_run.log_list('sample_list', list_to_log)

SDK v2 與 MLflow

list_to_log = [1, 2, 3, 2, 1, 2, 3, 2, 1]
from mlflow.entities import Metric
from mlflow.tracking import MlflowClient
import time

metrics = [Metric(key="sample_list", value=val, timestamp=int(time.time() * 1000), step=0) for val in list_to_log]
MlflowClient().log_batch(mlflow_run.info.run_id, metrics=metrics)

計量會出現在 Azure Machine Learning 工作室的 [計量] 索引標籤中。
不支援文字值。

記錄計量的數據列

SDK v1

azureml_run.log_row("sample_table", col1=5, col2=10)

SDK v2 與 MLflow

metrics = {"sample_table.col1": 5, "sample_table.col2": 10}
mlflow.log_metrics(metrics)

計量不會在 Azure Machine Learning 工作室中轉譯為數據表。
不支援文字值。
記錄為成品，而非計量。

記錄數據表

SDK v1

table = {
"col1" : [1, 2, 3],
"col2" : [4, 5, 6]
}
azureml_run.log_table("table", table)

SDK v2 與 MLflow

# Add a metric for each column prefixed by metric name. Similar to log_row
row1 = {"table.col1": 5, "table.col2": 10}
# To be done for each row in the table
mlflow.log_metrics(row1)

# Using mlflow.log_artifact
import json

with open("table.json", 'w') as f:
json.dump(table, f)
mlflow.log_artifact("table.json")

記錄每個數據行的計量。
計量不會在 Azure Machine Learning 工作室中轉譯為數據表。
不支援文字值。
記錄為成品，而非計量。

記錄精確度數據表

SDK v1

ACCURACY_TABLE = '{"schema_type": "accuracy_table", "schema_version": "v1", "data": {"probability_tables": ' +\
        '[[[114311, 385689, 0, 0], [0, 0, 385689, 114311]], [[67998, 432002, 0, 0], [0, 0, ' + \
        '432002, 67998]]], "percentile_tables": [[[114311, 385689, 0, 0], [1, 0, 385689, ' + \
        '114310]], [[67998, 432002, 0, 0], [1, 0, 432002, 67997]]], "class_labels": ["0", "1"], ' + \
        '"probability_thresholds": [0.52], "percentile_thresholds": [0.09]}}'

azureml_run.log_accuracy_table('v1_accuracy_table', ACCURACY_TABLE)

SDK v2 與 MLflow

ACCURACY_TABLE = '{"schema_type": "accuracy_table", "schema_version": "v1", "data": {"probability_tables": ' +\
        '[[[114311, 385689, 0, 0], [0, 0, 385689, 114311]], [[67998, 432002, 0, 0], [0, 0, ' + \
        '432002, 67998]]], "percentile_tables": [[[114311, 385689, 0, 0], [1, 0, 385689, ' + \
        '114310]], [[67998, 432002, 0, 0], [1, 0, 432002, 67997]]], "class_labels": ["0", "1"], ' + \
        '"probability_thresholds": [0.52], "percentile_thresholds": [0.09]}}'

mlflow.log_dict(ACCURACY_TABLE, 'mlflow_accuracy_table.json')

計量不會在 Azure Machine Learning 工作室中轉譯為精確度數據表。
記錄為成品，而非計量。
方法是實驗性的。mlflow.log_dict

記錄混淆矩陣

SDK v1

CONF_MATRIX = '{"schema_type": "confusion_matrix", "schema_version": "v1", "data": {"class_labels": ' + \
    '["0", "1", "2", "3"], "matrix": [[3, 0, 1, 0], [0, 1, 0, 1], [0, 0, 1, 0], [0, 0, 0, 1]]}}'

azureml_run.log_confusion_matrix('v1_confusion_matrix', json.loads(CONF_MATRIX))

SDK v2 與 MLflow

CONF_MATRIX = '{"schema_type": "confusion_matrix", "schema_version": "v1", "data": {"class_labels": ' + \
    '["0", "1", "2", "3"], "matrix": [[3, 0, 1, 0], [0, 1, 0, 1], [0, 0, 1, 0], [0, 0, 0, 1]]}}'

mlflow.log_dict(CONF_MATRIX, 'mlflow_confusion_matrix.json')

計量不會在 Azure Machine Learning 工作室中轉譯為混淆矩陣。
記錄為成品，而非計量。
方法是實驗性的。mlflow.log_dict

記錄預測

SDK v1

PREDICTIONS = '{"schema_type": "predictions", "schema_version": "v1", "data": {"bin_averages": [0.25,' + \
    ' 0.75], "bin_errors": [0.013, 0.042], "bin_counts": [56, 34], "bin_edges": [0.0, 0.5, 1.0]}}'

azureml_run.log_predictions('test_predictions', json.loads(PREDICTIONS))

SDK v2 與 MLflow

PREDICTIONS = '{"schema_type": "predictions", "schema_version": "v1", "data": {"bin_averages": [0.25,' + \
    ' 0.75], "bin_errors": [0.013, 0.042], "bin_counts": [56, 34], "bin_edges": [0.0, 0.5, 1.0]}}'

mlflow.log_dict(PREDICTIONS, 'mlflow_predictions.json')

計量不會在 Azure Machine Learning 工作室中轉譯為混淆矩陣。
記錄為成品，而非計量。
方法是實驗性的。mlflow.log_dict

記錄殘差

SDK v1

RESIDUALS = '{"schema_type": "residuals", "schema_version": "v1", "data": {"bin_edges": [100, 200, 300], ' + \
'"bin_counts": [0.88, 20, 30, 50.99]}}'

azureml_run.log_residuals('test_residuals', json.loads(RESIDUALS))

SDK v2 與 MLflow

RESIDUALS = '{"schema_type": "residuals", "schema_version": "v1", "data": {"bin_edges": [100, 200, 300], ' + \
'"bin_counts": [0.88, 20, 30, 50.99]}}'

mlflow.log_dict(RESIDUALS, 'mlflow_residuals.json')

計量不會在 Azure Machine Learning 工作室中轉譯為混淆矩陣。
記錄為成品，而非計量。
方法是實驗性的。mlflow.log_dict

檢視執行資訊和數據

您可以使用屬性 data 和 info MLflow run （mlflow.entities.Run ）物件來存取執行資訊。

提示

您可以使用 MLflow 來查詢 Azure 機器學習中的實驗和執行追蹤資訊，其提供完整的搜尋 API 來查詢和搜尋實驗並輕鬆執行，並快速比較結果。如需此維度中 MLflow 中所有功能的詳細資訊，請參閱查詢和比較實驗與 MLflow

下列範例示範如何擷取已完成的執行：

from mlflow.tracking import MlflowClient

# Use MlFlow to retrieve the run that was just completed
client = MlflowClient()
finished_mlflow_run = MlflowClient().get_run("<RUN_ID>")

下列範例示範如何檢視 metrics、 tags和 params：

metrics = finished_mlflow_run.data.metrics
tags = finished_mlflow_run.data.tags
params = finished_mlflow_run.data.params

注意

metrics只會有指定計量的最新記錄值。例如，如果您以的值來記錄 1，則 2、 3和最後 4 會記錄到名為 sample_metric的計量，則只會 4 出現在字典中 metrics 。若要取得針對特定具名計量記錄的所有計量，請使用 MlFlowClient.get_metric_history：

with mlflow.start_run() as multiple_metrics_run:
    mlflow.log_metric("sample_metric", 1)
    mlflow.log_metric("sample_metric", 2)
    mlflow.log_metric("sample_metric", 3)
    mlflow.log_metric("sample_metric", 4)

print(client.get_run(multiple_metrics_run.info.run_id).data.metrics)
print(client.get_metric_history(multiple_metrics_run.info.run_id, "sample_metric"))

如需詳細資訊，請參閱 MlFlowClient 參考。

欄位 info 提供執行的相關一般資訊，例如開始時間、執行識別碼、實驗識別碼等：

run_start_time = finished_mlflow_run.info.start_time
run_experiment_id = finished_mlflow_run.info.experiment_id
run_id = finished_mlflow_run.info.run_id

檢視執行成品

若要檢視執行的成品，請使用 MlFlowClient.list_artifacts：

client.list_artifacts(finished_mlflow_run.info.run_id)

若要下載成品，請使用 mlflow.artifacts.download_artifacts：

mlflow.artifacts.download_artifacts(run_id=finished_mlflow_run.info.run_id, artifact_path="Azure.png")

下一步

使用 MLflow 追蹤 ML 實驗和模型。
使用 MLflow 記錄計量、參數和檔案。
記錄 MLflow 模型。
查詢和比較實驗與 MLflow。
使用 MLflow 管理 Azure 機器學習中的模型登錄。

將記錄從 SDK v1 遷移至 SDK v2

為什麼 MLflow？

準備移轉至 MLflow

連線到您的工作區

實驗和執行

記錄 API 比較

記錄整數或浮點數計量

記錄布爾計量

記錄字串計量

將映像記錄至 PNG 或 JPEG 檔案

記錄 matplotlib.pyplot

記錄計量清單

記錄計量的數據列

記錄數據表

記錄精確度數據表

記錄混淆矩陣

記錄預測

記錄殘差

檢視執行資訊和數據

檢視執行成品

下一步

其他資源