你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

将 MLflow 模型部署到联机终结点

项目
09/03/2024

本文介绍如何将 MLflow 模型部署到联机终结点以进行实时推理。将 MLflow 模型部署到联机终结点时，无需指定评分脚本或环境，此功能称为无代码部署。

对于无代码部署，Azure 机器学习：

动态安装 conda.yaml 文件中提供的 Python 包。因此，依赖项在容器运行时期间安装。
提供 MLflow 基础映像/策展环境，其中包含以下各项：
- azureml-inference-server-http
- mlflow-skinny
- 用于推理的评分脚本。

提示

没有公用网络访问权限的工作区：必须打包模型（预览版），然后才能在没有传出连接的情况下将 MLflow 模型部署到联机终结点。通过使用模型打包，不必进行 Internet 连接；如果使用其他方式，Azure 机器学习则需要进行 Internet 连接才能为 MLflow 模型动态安装必要的 Python 包。

关于示例

此示例演示如何将 MLflow 模型部署到联机终结点以执行预测。此示例使用基于糖尿病数据集的 MLflow 模型。该数据集包含 10 个基线变量、年龄、性别、体重指数、平均血压以及从 442 名糖尿病患者获得的六项血清测量值。它还包含兴趣反应，即对基线后一年的疾病进展的定量测量。

该模型已使用 scikit-learn 回归量进行训练，所有必需的预处理都打包为管道，使此模型成为从原始数据到预测的端到端管道。

本文中的信息基于 azureml-examples 存储库中包含的代码示例。若要在不复制/粘贴 YAML 和其他文件的情况下在本地运行命令，请克隆存储库，然后将目录更改为 cli（如果使用 Azure CLI）。如果使用用于 Python 的 Azure 机器学习 SDK，请将目录更改为 sdk/python/endpoints/online/mlflow。

git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/cli

在 Jupyter Notebook 中跟进操作

可以按照使用 Azure 机器学习 Python SDK 的步骤操作，方法是在克隆的存储库中打开将 MLflow 模型部署到联机终结点笔记本。

先决条件

在按照本文中的步骤操作之前，请确保满足以下先决条件：

Azure 订阅。如果没有 Azure 订阅，请在开始操作前先创建一个免费帐户。试用免费版或付费版 Azure 机器学习。
Azure 基于角色的访问控制 (Azure RBAC) 用于授予对 Azure 机器学习中的操作的访问权限。若要执行本文中的步骤，必须为用户帐户分配 Azure 机器学习工作区的所有者或参与者角色，或者分配一个允许 Microsoft.MachineLearningServices/workspaces/onlineEndpoints/* 的自定义角色。有关角色的详细信息，请参阅管理对 Azure 机器学习工作区的访问权限。
必须具有已在工作区中注册的 MLflow 模型。本文在工作区中注册了为糖尿病数据集训练的模型。
此外，你还需要：
- 安装 Azure CLI 和 Azure CLI 的 ml 扩展。有关安装 CLI 的详细信息，请参阅安装和设置 CLI (v2)。
- 安装用于 Python 的 Azure 机器学习 SDK。
```
pip install azure-ai-ml azure-identity
```
- 安装 MLflow SDK 包 mlflow 和适用于 MLflow azureml-mlflow 的 Azure 机器学习插件。
```
pip install mlflow azureml-mlflow
```
- 如果未在 Azure 机器学习计算中运行代码，请将 MLflow 跟踪 URI 或 MLflow 的注册表 URI 配置为指向你正在处理的 Azure 机器学习工作区。有关如何将 MLflow 连接到工作区的详细信息，请参阅为 Azure 机器学习配置 MLflow。
在 Azure 机器学习工作室中工作时没有其他先决条件。

连接到工作区

首先，连接到你将处理工作的 Azure 机器学习工作区。

az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>

工作区是 Azure 机器学习的顶级资源，为使用 Azure 机器学习时创建的所有项目提供了一个集中的处理位置。在本部分中，连接到要在其中执行部署任务的工作区。

导入所需的库：

from azure.ai.ml import MLClient, Input
from azure.ai.ml.entities import (
ManagedOnlineEndpoint,
ManagedOnlineDeployment,
Model,
Environment,
CodeConfiguration,
)
from azure.identity import DefaultAzureCredential
from azure.ai.ml.constants import AssetTypes

配置工作区详细信息并获取工作区句柄：

subscription_id = "<subscription>"
resource_group = "<resource-group>"
workspace = "<workspace>"

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)

导入所需的库

import json
import mlflow
import requests
import pandas as pd
from mlflow.deployments import get_deploy_client
from mlflow.tracking import MlflowClient

初始化 MLflow 客户端
```
mlflow_client = MlflowClient()
```

配置部署客户端

deployment_client = get_deploy_client(mlflow.get_tracking_uri())

注册模型

只能将已注册的模型部署到联机终结点。在本例中，你已在存储库中拥有模型的本地副本，因此只需要将模型发布到工作区中的注册表。如果打算部署的模型已注册，则可以跳过此步骤。

MODEL_NAME='sklearn-diabetes'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path "endpoints/online/ncd/sklearn-diabetes/model"

model_name = 'sklearn-diabetes'
model_local_path = "sklearn-diabetes/model"
model = ml_client.models.create_or_update(
        Model(name=model_name, path=model_local_path, type=AssetTypes.MLFLOW_MODEL)
)

model_name = 'sklearn-diabetes'
model_local_path = "sklearn-diabetes/model"

registered_model = mlflow_client.create_model_version(
    name=model_name, source=f"file://{model_local_path}"
)
version = registered_model.version

如果在运行内记录了模型，该怎么办？

如果你的模型是在运行内记录的，则可以直接注册它。

若要注册模型，你需要知道它的存储位置。如果使用 MLflow 的 autolog 功能，则模型的路径取决于模型类型和框架。应检查作业输出以确定模型文件夹的名称。此文件夹包含一个名为 MLModel 的文件。

如果使用 log_model 方法手动记录模型，请将模型的路径作为参数传递给该方法。例如，如果使用 mlflow.sklearn.log_model(my_model, "classifier") 记录模型，则存储模型的路径名为 classifier。

使用 Azure 机器学习 CLI v2 根据训练作业输出创建模型。在以下示例中，使用 ID 为 $RUN_ID 的作业的项目注册名为 $MODEL_NAME 的模型。存储模型的路径为 $MODEL_PATH。

az ml model create --name $MODEL_NAME --path azureml://jobs/$RUN_ID/outputs/artifacts/$MODEL_PATH

注意

路径 $MODEL_PATH 是模型在运行中存储的位置。

model_name = 'sklearn-diabetes'

ml_client.models.create_or_update(
    Model(
        path=f"azureml://jobs/{RUN_ID}/outputs/artifacts/{MODEL_PATH}"
        name=model_name,
        type=AssetTypes.MLFLOW_MODEL
    )
)

注意

路径 MODEL_PATH 是模型在运行中存储的位置。

model_name = 'sklearn-diabetes'

registered_model = mlflow_client.create_model_version(
    name=model_name, source=f"runs://{RUN_ID}/{MODEL_PATH}"
)
version = registered_model.version

注意

路径 MODEL_PATH 是模型在运行中存储的位置。

将 MLflow 模型部署到联机终结点

配置要将模型部署到的终结点。以下示例配置终结点的名称和身份验证模式：

通过运行以下命令设置终结点名称（将 YOUR_ENDPOINT_NAME 替换为唯一名称）：

export ENDPOINT_NAME="<YOUR_ENDPOINT_NAME>"

配置终结点：

create-endpoint.yaml

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-endpoint
auth_mode: key

# Creating a unique endpoint name with current datetime to avoid conflicts
import datetime

endpoint_name = "sklearn-diabetes-" + datetime.datetime.now().strftime("%m%d%H%M%f")

endpoint = ManagedOnlineEndpoint(
    name=endpoint_name,
    description="An online endpoint to generate predictions for the diabetes dataset",
    auth_mode="key",
    tags={"foo": "bar"},
)

可以使用配置文件配置此终结点的属性。在这种情况下，你将终结点的身份验证模式配置为“密钥”。


# Creating a unique endpoint name with current datetime to avoid conflicts
import datetime

endpoint_name = "sklearn-diabetes-" + datetime.datetime.now().strftime("%m%d%H%M%f")

endpoint_config = {
    "auth_mode": "key",
    "identity": {
        "type": "system_assigned"
    }
}

将此配置写入 JSON 文件：

endpoint_config_path = "endpoint_config.json"
with open(endpoint_config_path, "w") as outfile:
    outfile.write(json.dumps(endpoint_config))

创建终结点：

az ml online-endpoint create --name $ENDPOINT_NAME -f endpoints/online/ncd/create-endpoint.yaml

ml_client.begin_create_or_update(endpoint)

endpoint = deployment_client.create_endpoint(
    name=endpoint_name,
    config={"endpoint-config-file": endpoint_config_path},
)

配置部署。部署是一组资源，用于承载执行实际推理的模型。

sklearn-deployment.yaml

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: sklearn-deployment
endpoint_name: my-endpoint
model:
  name: mir-sample-sklearn-ncd-model
  version: 2
  path: sklearn-diabetes/model
  type: mlflow_model
instance_type: Standard_DS3_v2
instance_count: 1

blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_F4s_v2",
    instance_count=1
)

或者，如果你的终结点没有传出连接，请通过加入参数 with_package=True 来使用模型打包（预览版）：

blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_F4s_v2",
    instance_count=1,
    with_package=True,
)

blue_deployment_name = "blue"

若要配置部署的硬件要求，请创建具有所需配置的 JSON 文件：

deploy_config = {
    "instance_type": "Standard_F4s_v2",
    "instance_count": 1,
}

注意

有关此配置的完整规范的详细信息，请参阅托管联机部署架构 (v2)。

将配置写入文件：

deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
    outfile.write(json.dumps(deploy_config))

注意

scoring_script 和 environment 的自动生成仅支持 pyfunc 模型风格。若要使用不同的模型风格，请参阅自定义 MLflow 模型部署。

创建部署：
```
az ml online-deployment create --name sklearn-deployment --endpoint $ENDPOINT_NAME -f endpoints/online/ncd/sklearn-deployment.yaml --all-traffic
```
如果终结点没有传出连接，请通过添加 --with-package 标记使用模型打包（预览版）：
```
az ml online-deployment create --with-package --name sklearn-deployment --endpoint $ENDPOINT_NAME -f endpoints/online/ncd/sklearn-deployment.yaml --all-traffic
```
```
ml_client.online_deployments.begin_create_or_update(blue_deployment)
```
```
blue_deployment = deployment_client.create_deployment(
    name=blue_deployment_name,
    endpoint=endpoint_name,
    model_uri=f"models:/{model_name}/{version}",
    config={"deploy-config-file": deployment_config_path},
)    
```
1. 在“终结点”页中，从“实时终结点”选项卡中选择“创建”。
2. 选择之前注册的 MLflow 模型，然后选择“选择”按钮。
  
  注意
  
  配置页包含一条说明，告知你已为你所选的 MLflow 模型自动生成了评分脚本和环境。
3. 选择“新建”以部署到新终结点。
4. 提供终结点和部署的名称，或保留默认名称。
5. 选择“部署”，将模型部署到终结点。
将所有流量分配到部署。到目前为止，终结点有一个部署，但没有为其分配任何流量。
在 Azure CLI 中不需要执行此步骤，因为你在创建期间使用了 --all-traffic 标志。如果需要更改流量，则可以使用命令 az ml online-endpoint update --traffic。有关如何更新流量的详细信息，请参阅逐步更新流量。
```
endpoint.traffic = {"blue": 100}
```
```
traffic_config = {"traffic": {blue_deployment_name: 100}}
```
将配置写入文件：
```
traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
    outfile.write(json.dumps(traffic_config))
```
在工作室中不需要执行此步骤。
更新终结点配置：
在 Azure CLI 中不需要执行此步骤，因为你在创建期间使用了 --all-traffic 标志。如果需要更改流量，则可以使用命令 az ml online-endpoint update --traffic。有关如何更新流量的详细信息，请参阅逐步更新流量。
```
ml_client.begin_create_or_update(endpoint).result()
```
```
deployment_client.update_endpoint(
    endpoint=endpoint_name,
    config={"endpoint-config-file": traffic_config_path},
)
```
在工作室中不需要执行此步骤。

调用终结点

部署准备就绪后，可以使用它来服务请求。测试部署的一种方法是在所使用的部署客户端中使用内置调用功能。以下 JSON 是部署的一个示例请求。

sample-request-sklearn.json

{"input_data": {
    "columns": [
      "age",
      "sex",
      "bmi",
      "bp",
      "s1",
      "s2",
      "s3",
      "s4",
      "s5",
      "s6"
    ],
    "data": [
      [ 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0 ],
      [ 10.0,2.0,9.0,8.0,7.0,6.0,5.0,4.0,3.0,2.0]
    ],
    "index": [0,1]
  }}

注意

在此示例中使用了 input_data，而不是在 MLflow 服务中使用的 inputs。这是因为 Azure 机器学习需要不同的输入格式才能为终结点自动生成 swagger 协定。有关预期输入格式的详细信息，请参阅 Azure 机器学习和 MLflow 内置服务器中部署的模型之间的差异。

按如下所示向终结点提交请求：

az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file endpoints/online/ncd/sample-request-sklearn.json

ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    request_file="sample-request-sklearn.json",
)

# Read the sample request that's in the json file to construct a pandas data frame
with open("sample-request-sklearn.json", "r") as f:
    sample_request = json.loads(f.read())
    samples = pd.DataFrame(**sample_request["input_data"])

deployment_client.predict(endpoint=endpoint_name, df=samples)

响应类似于以下文本：

[ 
  11633.100167144921,
  8522.117402884991
]

重要

对于 MLflow 无代码部署，目前不支持 通过本地终结点进行测试 。

自定义 MLflow 模型部署

无需在 MLflow 模型的部署定义中指定到联机终结点的评分脚本。但是，可以选择这样做并自定义推理的执行方式。

通常需要在以下情况下自定义 MLflow 模型部署：

模型没有采用 PyFunc 风格。
你需要自定义模型的运行方式，例如，通过 mlflow.<flavor>.load_model() 使用特定的风格加载模型。
当模型本身未完成时，需要在评分例程中执行预/后处理。
模型的输出不能在表格数据中很好地表示。例如，它是表示映像的张量。

重要

如果你选择指定 MLflow 模型部署的评分脚本，则还必须指定部署将运行的环境。

步骤

若要使用自定义评分脚本部署 MLflow 模型：

标识你的 MLflow 模型所在的文件夹。

a. 转到 Azure 机器学习工作室。

b. 转到“模型”部分。

c. 选择要部署的模型，然后转到其“工件”选项卡。

d. 记下显示的文件夹注释。注册模型时指定了此文件夹。

创建评分脚本。请注意你之前标识的文件夹名称 model 是如何包含在 init() 函数中的。

提示

以下评分脚本是有关如何执行 MLflow 模型推理的示例。可以根据需要调整此脚本，或根据自己的方案更改其中的任何部分。

score.py

import logging
import os
import json
import mlflow
from io import StringIO
from mlflow.pyfunc.scoring_server import infer_and_parse_json_input, predictions_to_json


def init():
    global model
    global input_schema
    # "model" is the path of the mlflow artifacts when the model was registered. For automl
    # models, this is generally "mlflow-model".
    model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model")
    model = mlflow.pyfunc.load_model(model_path)
    input_schema = model.metadata.get_input_schema()


def run(raw_data):
    json_data = json.loads(raw_data)
    if "input_data" not in json_data.keys():
        raise Exception("Request must contain a top level key named 'input_data'")

    serving_input = json.dumps(json_data["input_data"])
    data = infer_and_parse_json_input(serving_input, input_schema)
    predictions = model.predict(data)

    result = StringIO()
    predictions_to_json(predictions, result)
    return result.getvalue()

警告

MLflow 2.0 通告：提供的评分脚本适用于 MLflow 1.X 和 MLflow 2.X。但请注意，这些版本的预期输入/输出格式可能会有所不同。检查用于确保使用预期的 MLflow 版本的环境定义。请注意，MLflow 2.0 仅在 Python 3.8+ 中受支持。

创建一个可以执行评分脚本的环境。由于模型是 MLflow 模型，因此在模型包中也指定了 conda 要求。有关 MLflow 模型中包含的文件的更多详细信息，请参阅 MLmodel 格式。然后，你将使用文件中的 conda 依赖项构建环境。但是，你还需要加入 Azure 机器学习中联机部署所需的包 azureml-inference-server-http。

conda 定义文件如下所示：

conda.yml
```
channels:
- conda-forge
dependencies:
- python=3.9
- pip
- pip:
  - mlflow
  - scikit-learn==1.2.2
  - cloudpickle==2.2.1
  - psutil==5.9.4
  - pandas==2.0.0
  - azureml-inference-server-http
name: mlflow-env
```
注意

azureml-inference-server-http 包已添加到原始 conda 依赖项文件。

你将使用此 conda 依赖项文件来创建环境：
将在部署配置中内联创建环境。
```
environment = Environment(
    conda_file="sklearn-diabetes/environment/conda.yml",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest",
)
```
MLflow SDK 不支持此操作
1. 转到侧边菜单中的“环境”选项卡。
2. 选择选项卡“自定义环境”>“创建”。
3. 输入环境的名称，在本例中为 sklearn-mlflow-online-py37。
4. 对于“选择环境源”，请选择“使用具有可选 conda 文件的现有 docker 映像”。
5. 对于“容器注册表映像路径”中，输入“mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04”。
6. 选择“下一步”转到“自定义”部分。
7. 复制 sklearn-diabetes/environment/conda.yml 文件的内容并将其粘贴到文本框中。
8. 选择“下一步”转到“标记”页，然后再次选择“下一步”。
9. 在“查看”页上，选择“创建” 。环境已准备就绪，可供使用。
创建部署：
创建部署配置文件 deployment.yml：
```
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: sklearn-diabetes-custom
endpoint_name: my-endpoint
model: azureml:sklearn-diabetes@latest
environment: 
  image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04
  conda_file: sklearn-diabetes/environment/conda.yml
code_configuration:
  code: sklearn-diabetes/src
  scoring_script: score.py
instance_type: Standard_F2s_v2
instance_count: 1
```
创建部署：
```
az ml online-deployment create -f deployment.yml
```
```
blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=endpoint_name,
    model=model,
    environment=environment,
    code_configuration=CodeConfiguration(
        code="sklearn-diabetes/src",
        scoring_script="score.py"
    ),
    instance_type="Standard_F4s_v2",
    instance_count=1,
)
```
MLflow SDK 不支持此操作
1. 在“终结点”页面中选择“+ 创建”。
2. 选择之前注册的 MLflow 模型。
3. 在终结点创建向导中选择“更多选项”以打开高级选项。
4. 为终结点提供名称和身份验证类型，然后选择“下一步”可看到你所选的模型正在用于你的部署。
5. 选择“下一步”以继续转到“部署”页。
6. 选择“下一步”以转到“代码 + 环境”页。选择以 MLflow 格式注册的模型时，无需在此页上指定评分脚本或环境。但是，需要在此部分中指定一个
7. 选择“自定义环境和评分脚本”旁边的滑块。
8. 浏览以选择你之前创建的评分脚本。
9. 为环境类型选择“自定义环境”。
10. 选择你之前创建的自定义环境，然后选择“下一步”。
11. 完成向导，将模型部署到终结点。
部署完成后，它就可以服务请求了。一种测试部署的方法是使用示例请求文件和 invoke 方法。

sample-request-sklearn.json
```
{"input_data": {
    "columns": [
      "age",
      "sex",
      "bmi",
      "bp",
      "s1",
      "s2",
      "s3",
      "s4",
      "s5",
      "s6"
    ],
    "data": [
      [ 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0 ],
      [ 10.0,2.0,9.0,8.0,7.0,6.0,5.0,4.0,3.0,2.0]
    ],
    "index": [0,1]
  }}
```
按如下所示向终结点提交请求：
```
az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file endpoints/online/ncd/sample-request-sklearn.json
```
```
ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    deployment_name=deployment.name,
    request_file="sample-request-sklearn.json",
)
```
MLflow SDK 不支持此操作
1. 转到“终结点”选项卡，然后选择创建的新终结点。
2. 转到“测试”选项卡。
3. 将 sample-request-sklearn.json 文件的内容粘贴到“输入数据以测试终结点”框中。
4. 选择“测试”。
5. 预测将显示在框右侧的“测试结果”下。
响应类似于以下文本：
```
{
  "predictions": [ 
    11633.100167144921,
    8522.117402884991
  ]
}
```
警告

MLflow 2.0 公告：在 MLflow 1.X 中，predictions 密钥将缺失。

清理资源

使用完终结点后，请删除其关联的资源：

az ml online-endpoint delete --name $ENDPOINT_NAME --yes

ml_client.online_endpoints.begin_delete(endpoint_name)

deployment_client.delete_endpoint(endpoint_name)

通过

将 MLflow 模型部署到联机终结点

关于示例

在 Jupyter Notebook 中跟进操作

先决条件

连接到工作区

注册模型

如果在运行内记录了模型，该怎么办？

将 MLflow 模型部署到联机终结点

调用终结点

自定义 MLflow 模型部署

步骤

清理资源

反馈

其他资源

通过

将 MLflow 模型部署到联机终结点

关于示例

在 Jupyter Notebook 中跟进操作

先决条件

连接到工作区

注册模型

如果在运行内记录了模型，该怎么办？

将 MLflow 模型部署到联机终结点

调用终结点

自定义 MLflow 模型部署

步骤

清理资源

相关内容

反馈

其他资源