使用 Azure Machine Learning 推斷 HTTP 伺服器偵錯評分指令碼

發行項
03/05/2024

Azure Machine Learning 推斷 HTTP 伺服器是 Python 套件，可將評分函式公開為 HTTP 端點，並將 Flask 伺服器程式碼和相依性包裝成單一套件。其包含在使用 Azure Machine Learning 部署模型時用於推斷的預建 Docker 映像。單獨使用套件，您可以在本機部署模型以供生產環境使用，而且也可以在本機開發環境中輕鬆驗證評分 (輸入) 指令碼。如果評分指令碼發生問題，伺服器會傳回錯誤以及發生錯誤的位置。

此伺服器也可以用於在持續整合和部署管線中建立驗證閘道。例如，您可使用候選的指令碼啟動伺服器，並針對本機端點執行測試套件。

本文主要以想要使用推斷伺服器在本機偵錯的使用者為目標，但也會協助您瞭解如何搭配線上端點使用推斷伺服器。

線上端點本機偵錯

先在本機針對端點進行偵錯再將其部署至雲端，可協助您提早抓出程式碼和設定中的錯誤。若要在本機偵錯端點，您可以使用：

Azure Machine Learning 推斷 HTTP 伺服器
本機端點

本文著重於 Azure Machine Learning 推斷 HTTP 伺服器。

下表概述各種案例以協助您選擇最適合的項目。

案例	推斷 HTTP 伺服器	本機端點
更新本機 Python 環境，而「不需」重建 Docker 映像	是	No
更新評分指令碼	Yes	Yes
更新部署設定 (部署、環境、程式碼、模型)	No	Yes
整合 VS Code 偵錯工具	Yes	Yes

藉由在本機執行推斷 HTTP 伺服器，您可以專注於偵錯評分指令碼，而不會受到部署容器組態的影響。

必要條件

需要：Python >=3.8
Anaconda

提示

Azure Machine Learning 推斷 HTTP 伺服器會在 Windows 和 Linux 作業系統上執行。

安裝

注意

若要避免套件衝突，請在虛擬環境中安裝伺服器。

若要安裝 azureml-inference-server-http package，請在您的 cmd/終端機中執行下列命令：

python -m pip install azureml-inference-server-http

在本機對評分指令碼進行偵錯

若要在本機偵錯評分指令碼，您可以使用虛擬評分指令碼來測試伺服器的行為方式、使用 VS Code 搭配 azureml-inference-server-http 套件進行偵錯，或使用我們範例存放庫中的實際評分指令碼、模型檔案和環境檔案來測試伺服器。

使用虛擬評分指令碼測試伺服器行為

建立目錄以保存您的檔案：

mkdir server_quickstart
cd server_quickstart

若要避免套件衝突，請建立虛擬環境並啟動：
```
python -m venv myenv
source myenv/bin/activate
```
提示

測試之後，請執行 deactivate 來停用 Python 虛擬環境。
從 pypi 摘要安裝 azureml-inference-server-http 套件：
```
python -m pip install azureml-inference-server-http
```

建立您的輸入腳本 (score.py)。下列範例會建立基本的輸入腳本：

echo '
import time

def init():
    time.sleep(1)

def run(input_data):
    return {"message":"Hello, World!"}
' > score.py

啟動伺服器 (azmlinfsrv)，並將 score.py 設定為輸入指令碼：
```
azmlinfsrv --entry_script score.py
```
注意

伺服器裝載於 0.0.0.0，這表示它會接聽主控機器的所有 IP 位址。
使用 curl，將評分要求傳送至伺服器：
```
curl -p 127.0.0.1:5001/score
```
伺服器的回應如下所示。
```
{"message": "Hello, World!"}
```

測試之後，您可以按 Ctrl + C 來終止伺服器。現在您可以修改評分指令碼 (score.py)，並再次執行伺服器以測試您的變更 (azmlinfsrv --entry_script score.py)。

如何與 Visual Studio Code 整合

有兩種方式可以使用 Visual Studio Code (VS Code) 和 Python 延伸模組，以透過 azureml-inference-server-http 套件進行偵錯 (啟動和附加模式)。

啟動模式：在 VS Code 中設定 launch.json，並在 VS Code 內啟動 Azure Machine Learning 推斷 HTTP 伺服器。
1. 啟動 VS Code，然後開啟包含指令碼的資料夾 (score.py)。
2. 將下列組態新增至 VS Code 中該工作區的 launch.json：
  
  launch.json
```
{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Debug score.py",
            "type": "python",
            "request": "launch",
            "module": "azureml_inference_server_http.amlserver",
            "args": [
                "--entry_script",
                "score.py"
            ]
        }
    ]
}
```
3. 在 VS Code 中啟動偵錯工作階段。選取 [執行] -> [開始偵錯] (或 F5)。
附加模式：在命令列中啟動 Azure Machine Learning 推斷 HTTP 伺服器，並使用 VS Code + Python 延伸模組來附加至程序。

注意

如果您使用 Linux 環境，請先執行 sudo apt-get install -y gdb 來安裝 gdb 套件。
1. 將下列組態新增至 VS Code 中該工作區的 launch.json：
  
  launch.json
```
{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Attach using Process Id",
            "type": "python",
            "request": "attach",
            "processId": "${command:pickProcess}",
            "justMyCode": true
        },
    ]
}
```
2. 使用 CLI 啟動推斷伺服器 (azmlinfsrv --entry_script score.py)。
3. 在 VS Code 中啟動偵錯工作階段。
  1. 在 VS Code 中，選取 [執行] -> [開始偵錯] (或 F5)。
  2. 使用 CLI 中顯示的記錄，輸入 azmlinfsrv 的程序識別碼 (不是 gunicorn)。
  注意
  
  如果未顯示程序選擇器，請在 launch.json 的 processId 欄位中手動輸入程序識別碼。

使用者可以透過這兩種方式來設定中斷點，並逐步進行偵錯。

端對端範例

在本節中，我們將使用範例存放庫中的範例檔案 (評分指令碼、模型檔案和環境) 在本機執行伺服器。我們的使用線上端點部署和評分機器學習模型文章中也會使用範例檔案

複製範例存放庫。

git clone --depth 1 https://github.com/Azure/azureml-examples
cd azureml-examples/cli/endpoints/online/model-1/

使用 conda 建立並啟用虛擬環境。在此範例中，azureml-inference-server-http 套件會自動安裝，因為其包含在 conda.yml 中作為 azureml-defaults 套件的相依程式庫，如下所示。
```
# Create the environment from the YAML file
conda env create --name model-env -f ./environment/conda.yml
# Activate the new environment
conda activate model-env
```

檢閱您的計分指令碼。

onlinescoring/score.py

import os
import logging
import json
import numpy
import joblib


def init():
    """
    This function is called when the container is initialized/started, typically after create/update of the deployment.
    You can write the logic here to perform init operations like caching the model in memory
    """
    global model
    # AZUREML_MODEL_DIR is an environment variable created during deployment.
    # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
    # Please provide your model's folder name if there is one
    model_path = os.path.join(
        os.getenv("AZUREML_MODEL_DIR"), "model/sklearn_regression_model.pkl"
    )
    # deserialize the model file back into a sklearn model
    model = joblib.load(model_path)
    logging.info("Init complete")


def run(raw_data):
    """
    This function is called for every invocation of the endpoint to perform the actual scoring/prediction.
    In the example we extract the data from the json input and call the scikit-learn model's predict()
    method and return the result back
    """
    logging.info("model 1: request received")
    data = json.loads(raw_data)["data"]
    data = numpy.array(data)
    result = model.predict(data)
    logging.info("Request processed")
    return result.tolist()

藉由指定評分指令碼和模型檔案來執行推斷伺服器。指定的模型目錄 (model_dir 參數) 會定義為 AZUREML_MODEL_DIR 變數，並在評分指令碼中擷取。在此情況下，我們會指定目前的目錄 (./)，因為評分指令碼中指定子目錄為 model/sklearn_regression_model.pkl。
```
azmlinfsrv --entry_script ./onlinescoring/score.py --model_dir ./
```
如果伺服器已啟動且成功叫用評分指令碼，則會顯示啟動記錄範例。否則，記錄中會出現錯誤訊息。
使用範例資料測試評分指令碼。開啟另一個終端並移至相同的工作目錄，以執行命令。使用 curl 命令，將範例要求傳送至伺服器並接收評分結果。
```
curl --request POST "127.0.0.1:5001/score" --header "Content-Type:application/json" --data @sample-request.json
```
如果您的評分指令碼沒有問題，則會傳回評分結果。如果您發現問題，您可以嘗試更新評分指令碼，然後再次啟動伺服器來測試已更新的指令碼。

伺服器路由

伺服器在這些路由上接聽連接埠 5001 (作為預設值)。

名稱	路由
活躍度探查	127.0.0.1:5001/
分數	127.0.0.1:5001/score
OpenAPI (swagger)	127.0.0.1:5001/swagger.json

伺服器參數

下表包含伺服器接受的參數：

參數	必要	預設	描述
entry_script	True	N/A	評分指令碼的相對或絕對路徑。
model_dir	False	N/A	保存用於推斷之模型的目錄相對或絕對路徑。
port	False	5001	伺服器的服務連接埠。
worker_count	False	1	將處理並行要求的背景工作執行緒數目。
appinsights_instrumentation_key	False	N/A	將發佈記錄的 Application Insights 檢測金鑰。
access_control_allow_origins	False	N/A	為指定的原點啟用 CORS。使用 "," 分隔多個原點。範例："microsoft.com, bing.com"

要求流程

下列步驟說明 Azure Machine Learning 推斷 HTTP 伺服器 (azmlinfsrv) 如何處理傳入要求：

Python CLI 包裝函式位於伺服器的網路堆疊周圍，可用來啟動伺服器。
用戶端會將要求傳送至伺服器。
收到要求時，會通過 WSGI 伺服器，然後分派給其中一個背景工作角色。
- Gunicorn 用於 Linux。
- Waitress 用於 Windows。
接著，Flask 應用程式會處理這些要求，該應用程式會載入輸入腳本和任何相依性。
最後，要求會傳送至您的輸入腳本。然後，輸入腳本會對載入的模型進行推斷呼叫，並傳回回應。

HTTP 伺服器進程的圖表。

瞭解記錄

我們在此說明 Azure Machine Learning 推斷 HTTP 伺服器的記錄。您可以在本機執行 azureml-inference-server-http 時取得記錄，若使用線上端點，則可取得容器記錄。

注意

記錄格式自 0.8.0 版以來已變更。如果您發現不同的記錄樣式，請將 azureml-inference-server-http 套件更新為最新版本。

提示

如果您使用線上端點，推斷伺服器的記錄會以 Azure Machine Learning Inferencing HTTP server <version> 開頭。

啟動記錄

啟動伺服器時，記錄會先顯示伺服器設定，如下所示：

Azure Machine Learning Inferencing HTTP server <version>


Server Settings
---------------
Entry Script Name: <entry_script>
Model Directory: <model_dir>
Worker Count: <worker_count>
Worker Timeout (seconds): None
Server Port: <port>
Application Insights Enabled: false
Application Insights Key: <appinsights_instrumentation_key>
Inferencing HTTP server version: azmlinfsrv/<version>
CORS for the specified origins: <access_control_allow_origins>


Server Routes
---------------
Liveness Probe: GET   127.0.0.1:<port>/
Score:          POST  127.0.0.1:<port>/score

<logs>

例如，當您啟動伺服器時，遵循端對端範例：

Azure Machine Learning Inferencing HTTP server v0.8.0


Server Settings
---------------
Entry Script Name: /home/user-name/azureml-examples/cli/endpoints/online/model-1/onlinescoring/score.py
Model Directory: ./
Worker Count: 1
Worker Timeout (seconds): None
Server Port: 5001
Application Insights Enabled: false
Application Insights Key: None
Inferencing HTTP server version: azmlinfsrv/0.8.0
CORS for the specified origins: None


Server Routes
---------------
Liveness Probe: GET   127.0.0.1:5001/
Score:          POST  127.0.0.1:5001/score

2022-12-24 07:37:53,318 I [32726] gunicorn.error - Starting gunicorn 20.1.0
2022-12-24 07:37:53,319 I [32726] gunicorn.error - Listening at: http://0.0.0.0:5001 (32726)
2022-12-24 07:37:53,319 I [32726] gunicorn.error - Using worker: sync
2022-12-24 07:37:53,322 I [32756] gunicorn.error - Booting worker with pid: 32756
Initializing logger
2022-12-24 07:37:53,779 I [32756] azmlinfsrv - Starting up app insights client
2022-12-24 07:37:54,518 I [32756] azmlinfsrv.user_script - Found user script at /home/user-name/azureml-examples/cli/endpoints/online/model-1/onlinescoring/score.py
2022-12-24 07:37:54,518 I [32756] azmlinfsrv.user_script - run() is not decorated. Server will invoke it with the input in JSON string.
2022-12-24 07:37:54,518 I [32756] azmlinfsrv.user_script - Invoking user's init function
2022-12-24 07:37:55,974 I [32756] azmlinfsrv.user_script - Users's init has completed successfully
2022-12-24 07:37:55,976 I [32756] azmlinfsrv.swagger - Swaggers are prepared for the following versions: [2, 3, 3.1].
2022-12-24 07:37:55,977 I [32756] azmlinfsrv - AML_FLASK_ONE_COMPATIBILITY is set, but patching is not necessary.

記錄格式

推斷伺服器的記錄會以下列格式產生，但啟動器指令碼除外，因為它們不是 Python 套件的一部分：

<UTC Time> | <level> [<pid>] <logger name> - <message>

此處的 <pid> 是程序識別碼，而 <level> 是記錄層級的第一個字元 – E 代表 ERROR，I 代表 INFO 等等。

Python 中有六個層級的記錄，其中包含與嚴重性相關聯的數字：

記錄層級	數值
危急	50
錯誤	40
警告	30
INFO	20
偵錯	10
NOTSET	0

疑難排解指南

在本節中，我們將提供 Azure Machine Learning 推斷 HTTP 伺服器的基本疑難排解秘訣。如果您想要針對線上端點進行疑難排解，另請參閱針對線上端點部署進行疑難排解

基本步驟

疑難排解的基本步驟如下：

收集 Python 環境的版本資訊。
請確定在環境檔案中指定的 azureml-inference-server-http python 套件版本符合啟動記錄中所顯示的 AzureML 推斷 HTTP 伺服器版本。有時候 pip 的相依性解析程式會導致安裝非預期的套件版本。
如果您在環境中指定 Flask (及其相依性)，請將其移除。相依性包括 Flask、Jinja2、itsdangerous、Werkzeug、MarkupSafe 和 click。 Flask 會列為伺服器套件中的相依性，最好讓我們的伺服器加以安裝。如此一來，當伺服器支援新版 Flask 時，您會自動取得它們。

伺服器版本

伺服器套件 azureml-inference-server-http 已發佈至 PyPI。您可以在 PyPI 頁面上找到我們的變更記錄和所有舊版。如果您使用舊版，請更新為最新版本。

0.4.x：定型映射中配套的版本 ≤ 20220601 和在 azureml-defaults>=1.34,<=1.43 中。 0.4.13 是最後一個穩定版本。如果您在版本 0.4.11 之前使用伺服器，您可能會看到 Flask 相依性問題，像是無法從 jinja2 匯入名稱 Markup。建議您儘可能升級至 0.4.13 或 0.8.x (最新版本)。
0.6.x：預先安裝於推斷映像的版本 ≤ 20220516。最新穩定版本是 0.6.1。
0.7.x：支援 Flask 2 的第一個版本。最新穩定版本是 0.7.7。
0.8.x：記錄格式已變更，並已卸除 Python 3.6 支援。

套件相依性

伺服器 azureml-inference-server-http 的最相關套件如下：

flask
opencensus-ext-azure
inference-schema

如果您在 Python 環境中指定 azureml-defaults，則 azureml-inference-server-http 套件會相依，而且會自動安裝。

提示

如果您使用 Python SDK v1，且未在 Python 環境中明確指定 azureml-defaults，SDK 可能會為您新增套件。不過，其會將它鎖定為開啟 SDK 的版本。例如，如果 SDK 版本是 1.38.0，則會將 azureml-defaults==1.38.0 新增至環境的 pip 需求。

常見問題集

1.我在伺服器啟動期間遇到下列錯誤：


TypeError: register() takes 3 positional arguments but 4 were given

  File "/var/azureml-server/aml_blueprint.py", line 251, in register

    super(AMLBlueprint, self).register(app, options, first_registration)

TypeError: register() takes 3 positional arguments but 4 were given

您已在 Python 環境中安裝 Flask 2，但執行的 azureml-inference-server-http 版本不支援 Flask 2。在 azureml-inference-server-http>=0.7.0 中以及 azureml-defaults>=1.44 中新增 Flask 2 的支援。

如果您未在 AzureML Docker 圖片中使用此套件，請使用 azureml-inference-server-http 或 azureml-defaults 的最新版本。
如果您使用此套件搭配 AzureML Docker 映像，請確定您使用內建或 2022 年 7 月之後的映像。映像版本可在容器記錄中取得。您應該能夠找到類似以下的記錄：
```
2022-08-22T17:05:02,147738763+00:00 | gunicorn/run | AzureML Container Runtime Information
2022-08-22T17:05:02,161963207+00:00 | gunicorn/run | ###############################################
2022-08-22T17:05:02,168970479+00:00 | gunicorn/run | 
2022-08-22T17:05:02,174364834+00:00 | gunicorn/run | 
2022-08-22T17:05:02,187280665+00:00 | gunicorn/run | AzureML image information: openmpi4.1.0-ubuntu20.04, Materializaton Build:20220708.v2
2022-08-22T17:05:02,188930082+00:00 | gunicorn/run | 
2022-08-22T17:05:02,190557998+00:00 | gunicorn/run | 
```
圖片的組建日期會出現在「具體化組建」之後，在上述範例中為 20220708，或 2022 年 7 月 8 日。此圖片與 Flask 2 相容。如果您在容器記錄檔中看不到類似這樣的橫幅，您的圖片已過期，而且應該更新。如果您使用 CUDA 映像，而且找不到較新的映像，請檢查您的映像是否已在 AzureML-Containers 中遭到取代。如果是，您應該能夠找到取代項目。
如果您使用具有線上端點的伺服器，您也可以在 Azure Machine Learning 工作室中線上端點頁面的「部署記錄」底下找到。如果您使用第 1 版 SDK 進行部署，但未在部署組態中明確指定圖片，則會預設使用符合本機 SDK 工具組的 openmpi4.1.0-ubuntu20.04 版本，這可能不是圖片的最新版本。例如，SDK 1.43 預設會使用不相容的 openmpi4.1.0-ubuntu20.04:20220616。請確定您為部署使用最新的 SDK。
如果基於某些原因而無法更新圖片，您可以透過釘選 azureml-defaults==1.43 或 azureml-inference-server-http~=0.4.13 來暫時避免問題，此舉會使用 Flask 1.0.x 安裝較舊的版本伺服器。

2.在啟動期間，我在模組 `opencensus`、`jinja2`、`MarkupSafe` 或 `click` 發生 `ImportError` 或 `ModuleNotFoundError`，如以下訊息所示：

ImportError: cannot import name 'Markup' from 'jinja2'

舊版伺服器 (<= 0.4.10) 未將 Flask 的相依性關聯釘選至相容的版本。伺服器的最新版本已修正此問題。

下一步

如需建立輸入腳本和部署模型的詳細資訊，請參閱如何使用 Azure Machine Learning 部署模型。
了解推斷的預建 docker 映像