使用 Windows ML 執行 ONNX 模型

2025-05-20

這很重要

Windows ML API 目前為實驗性， 不支援 在生產環境中使用。嘗試這些 API 的應用程式不應該發佈至 Microsoft 市集。

使用 Microsoft.Windows.AI.MachineLearning 命名空間中的 Windows Machine Learning （ML）類型，您可以在 Windows 應用程式中本機執行 ONNX 模型，而不需要手動管理基礎執行提供者（EP）套件。 API 會處理下載、更新和初始化 EP;您接著可以繼續使用 Microsoft.Windows.AI.MachineLearning 和/或 ONNX Runtime。

先決條件

執行版本 24H2 （組建 26100）或更新版本的 Windows 11 計算機。

除了上述內容之外，還有特定語言的必要條件，視您的應用程式所撰寫的語言而定。

.NET 8 或更新版本
目標是達到 TFM windows10.0.26100 或更高

步驟 1：安裝 WinML 執行時間套件和 NuGet 套件

運行時間套件會透過 Microsoft Store 發佈。在 Windows 終端機中執行下列命令來安裝它（標識碼是運行時間套件的市集目錄識別碼）：

winget install --id 9MVL55DVGWWW

然後根據您的應用程式程式設計語言，遵循下列步驟。

在您的 .NET 專案中，新增 Microsoft.Windows.AI.MachineLearning NuGet 套件（如果使用 NuGet 套件管理員，請務必包含發行前版本套件）。

dotnet add package Microsoft.Windows.AI.MachineLearning --prerelease

然後在您的程式代碼中匯入命名空間。

using Microsoft.ML.OnnxRuntime;
using Microsoft.Windows.AI.MachineLearning;

在您的 Visual Studio 專案中，使用 NuGet 套件管理員來搜尋並將 Microsoft.Windows.AI.MachineLearning NuGet 套件新增至您的專案（請務必在搜尋中包含發行前版本 pacakges）。

然後將 OnnxRuntime 頭檔新增至您的程式代碼。

#include <win_onnxruntime_cxx_api.h>

Windows ML 提供稱為 onnxruntime-winml的 Python 系結，其具有 EP 擷取和設定的 Python 支援。設定之後，Python 應用程式可以使用 ONNX 執行時間功能，例如像往常一樣自動選取 EP。

pip install --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple --extra-index-url https://pypi.org/simple onnxruntime-winml

然後在您的程式代碼中匯入 OnnxRuntime 模組。

import onnxruntime as ort

步驟 2：下載並註冊最新的 EPs

然後，我們將使用 Windows ML 來確保裝置上擁有最新的執行提供者（EPs），並在 ONNX 執行環境中註冊。

// First we create a new instance of EnvironmentCreationOptions
EnvironmentCreationOptions envOptions = new()
{
    logId = "WinMLDemo", // Use an ID of your own choice
    logLevel = OrtLoggingLevel.ORT_LOGGING_LEVEL_ERROR
};

// And then use that to create the ORT environment
using var ortEnv = OrtEnv.CreateInstanceWithOptions(ref envOptions);

// Then, initialize Windows ML infrastructure
Infrastructure infrastructure = new();

// Ensure the latest execution providers are available (downloads them if they aren't)
await infrastructure.DownloadPackagesAsync();

// And register the EPs with ONNX Runtime
await infrastructure.RegisterExecutionProviderLibrariesAsync();

// First we need to create an ORT environment
Ort::Env env(ORT_LOGGING_LEVEL_ERROR, "WinMLDemo"); // Use an ID of your own choice

// Then, initialize Windows ML infrastructure
winrt::Microsoft.Windows.AI.MachineLearning::Infrastructure infrastructure{};

// Ensure the latest execution providers are available (downloads them if they aren't)
co_await infrastructure.DownloadPackagesAsync();

// And register the EPs with ONNX Runtime
co_await infrastructure.RegisterExecutionProviderLibrariesAsync();

步驟 3：設定執行提供者

ONNX 執行時間允許應用程式根據裝置原則來設定執行提供者（EPs），或明確允許更充分地控制提供者選項，以及應該使用哪些裝置。

建議您從明確選取 EP 開始，讓您可以在結果中有更多的可預測性。完成此工作之後，您可以嘗試使用裝置原則，以自然且以結果為導向的方式選取執行提供者。

若要明確選取一或多個 EP，您將使用 GetEpDevices 上的 OrtApi函式，以透過所有可用的裝置來列舉。 SessionOptionsAppendExecutionProvider_V2 然後可用來明確附加特定裝置，並提供自定義提供者選項給所需的 EP。

using Microsoft.ML.OnnxRuntime;

// Get all available EP devices from the environment
var epDevices = ortEnv.GetEpDevices();

// Accumulate devices by EpName
// Passing all devices for a given EP in a single call allows the execution provider
// to select the best configuration or combination of devices, rather than being limited
// to a single device. This enables optimal use of available hardware if supported by the EP.
var epDeviceMap = epDevices
    .GroupBy(device => device.EpName)
    .ToDictionary(g => g.Key, g => g.ToList());

// For demonstration, list all found EPs, vendors, and device types
foreach (var epGroup in epDeviceMap)
{
    var epName = epGroup.Key;
    var devices = epGroup.Value;

    Console.WriteLine($"Execution Provider: {epName}");
    foreach (var device in devices)
    {
        string deviceType = GetDeviceTypeString(device.HardwareDevice.Type);
        Console.WriteLine($" | Vendor: {device.EpVendor,-16} | Device Type: {deviceType,-8}");
    }
}

// Configure and append each EP type only once, with all its devices
var sessionOptions = new SessionOptions();
foreach ((var epName, var devices) in epDeviceMap)
{
    Dictionary<string, string> epOptions = new();
    switch (epName)
    {
        case "VitisAIExecutionProvider":
            // Demonstrating passing no options for VitisAI
            sessionOptions.AppendExecutionProvider(ortEnv, devices, epOptions);
            Console.WriteLine($"Successfully added {epName} EP");
            break;

        case "OpenVINOExecutionProvider":
            // Configure threading for OpenVINO EP, pick the first device found
            epOptions["num_of_threads"] = "4";
            sessionOptions.AppendExecutionProvider(ortEnv, [devices.First()], epOptions);
            Console.WriteLine($"Successfully added {epName} EP (first device only)");
            break;

        case "QNNExecutionProvider":
            // Configure performance mode for QNN EP
            epOptions["htp_performance_mode"] = "high_performance";
            sessionOptions.AppendExecutionProvider(ortEnv, devices, epOptions);
            Console.WriteLine($"Successfully added {epName} EP");
            break;

        default:
            Console.WriteLine($"Skipping EP: {epName}");
            break;
    }
}

#include <win_onnxruntime_cxx_api.h>

// Get all available EP devices from the environment
std::vector<Ort::ConstEpDevice> ep_devices = env.GetEpDevices();

// Accumulate devices by ep_name
// Passing all devices for a given EP in a single call allows the execution provider
// to select the best configuration or combination of devices, rather than being limited
// to a single device. This enables optimal use of available hardware if supported by the EP.
std::unordered_map<std::string, std::vector<Ort::ConstEpDevice>> ep_device_map;
for (const auto& device : ep_devices)
{
    ep_device_map[device.EpName()].push_back(device);
}

// For demonstration, list all found EPs, vendors, and device types
for (const auto& [ep_name, devices] : ep_device_map)
{
    std::cout << "Execution Provider: " << ep_name << std::endl;
    for (const auto& device : devices)
    {
        std::cout << " | Vendor: " << std::setw(16) << device.EpVendor() << " | Device Type: " << std::setw(8)
                    << ToString(device.Device().Type()) << std::endl;
    }
}

// Configure and append each EP type only once, with all its devices
Ort::SessionOptions session_options;
for (const auto& [ep_name, devices] : ep_device_map)
{
    Ort::KeyValuePairs ep_options;
    if (ep_name == "VitisAIExecutionProvider")
    {
        // Demonstrating passing no options for VitisAI
        session_options.AppendExecutionProvider_V2(env, devices, ep_options);
    }
    else if (ep_name == "OpenVINOExecutionProvider")
    {
        // Configure threading for OpenVINO EP, pick the first device found.
        ep_options.Add("num_of_threads", "4");
        session_options.AppendExecutionProvider_V2(env, {devices.front()}, ep_options);
    }
    else if (ep_name == "QNNExecutionProvider")
    {
        // Configure performance mode for QNN EP
        ep_options.Add("htp_performance_mode", "high_performance");
        session_options.AppendExecutionProvider_V2(env, devices, ep_options);
    }
    else
    {
        std::cout << "Skipping EP: " << ep_name << std::endl;
    }
}

# This example shows how to register a specific EP.
# Note that EPs registered by Windows ML cannot be accessed via the old "providers" option

import onnxruntime as ort

# Select a specific EP.
def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):
    ep_devices = ort.get_ep_devices()
    for ep_device in ep_devices:
        if ep_device.ep_name == ep_name and ep_device.device.type == device_type:
            session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)

options = ort.SessionOptions()
add_ep_for_device(options, "QNNExecutionProvider", ort.OrtHardwareDeviceType.NPU)  # example for QNN NPU
assert options.has_providers()

session = ort.InferenceSession(
    "path_to_your_model.onnx",
    sess_options=options,
)

如需詳細資訊，請參閱 ONNX Runtime OrtApi 檔。若要瞭解 EP 的版本控制策略，請參閱執行提供者的版本控制檔。

步驟 4：編譯模型

ONNX 模型必須編譯成優化表示法，可在裝置的基礎硬體上有效率地執行。您在步驟 3 中設定的執行提供者有助於執行此轉換。

從 1.22 版開始，ONNX 運行時間已引進新的 API，以更妥善地封裝編譯步驟。 ONNX 運行時間編譯檔提供更多詳細數據（請參閱 OrtCompileApi 結構）。

備註

編譯可能需要幾分鐘的時間才能完成。如此一來，任何UI都會保持回應，請考慮將此作為應用程式中的背景作業。

// Prepare compilation options using our session we configured in step 3
OrtModelCompilationOptions compileOptions = new(sessionOptions);
compileOptions.SetInputModelPath(modelPath);
compileOptions.SetOutputModelPath(compiledModelPath);

// Compile the model
compileOptions.CompileModel();

const OrtCompileApi* compileApi = ortApi.GetCompileApi();

// Prepare compilation options
OrtModelCompilationOptions* compileOptions = nullptr;
OrtStatus* status = compileApi->CreateModelCompilationOptionsFromSessionOption(env, sessionOptions, &compileOptions);
status = compileApi->ModelCompilationOptions_SetInputModelPath(compileOptions, modelPath.c_str());
status = compileApi->ModelCompilationOptions_SetOutputModelPath(compileOptions, compiledModelPath.c_str());

// Compile the model
status = compileApi->CompileModel(env, compileOptions);

// Clean up
compileApi->ReleaseModelCompilationOptions(compileOptions);

input_model_path = "path_to_your_model.onnx"
output_model_path = "path_to_your_compiled_model.onnx"

model_compiler = ort.ModelCompiler(
    options,
    input_model_path,
    embed_compiled_data_into_model=True,
    external_initializers_file_path=None,
)
model_compiler.compile_to_file(output_model_path)
assert os.path.exists(output_model_path)

步驟 5：執行模型推斷

既然模型已針對裝置上的本機硬體進行編譯，我們可以建立推斷會話並推斷模型。

// Create inference session using compiled model
using InferenceSession session = new(compiledModelPath, sessionOptions);

// Create inference session using compiled model
Ort::Session session(env, compiledModelPath.c_str(), sessionOptions);

# Create inference session using compiled model
session = ort.InferenceSession(output_model_path, sess_options=options)

步驟 6：散發您的應用程式

散發應用程式之前，C# 和C++開發人員必須採取其他步驟，以確保安裝應用程式時，用戶的裝置上已安裝 Windows ML 運行時間。請參閱散發您的應用程式頁面以深入瞭解。

模型編譯

ONNX 模型會以圖形表示，其中節點會對應至運算符（例如矩陣乘法、卷積和其他數學進程），而邊緣會定義它們之間的數據流。

這個以圖表為基礎的結構允許有效率的執行和優化，方法是允許運算元融合等轉換（也就是將多個相關作業結合成單一優化作業），以及圖表剪除（也就是從圖形中移除多餘的節點）。

模型編譯是指利用執行提供者（EP）將 ONNX 模型轉換成優化表示法的程式，可在裝置的基礎硬體上有效率地執行。

設計供編譯使用

以下是在應用程式中處理編譯的一些想法。

編譯效能。編譯可能需要幾分鐘的時間才能完成。如此一來，任何UI都會保持回應，請考慮將此作為應用程式中的背景作業。
使用者介面更新。請考慮讓使用者知道您的應用程式是否正在執行任何編譯工作，並在完成時通知它們。
優雅的後援機制。如果載入已編譯的模型時發生問題，請嘗試擷取失敗的診斷數據，並盡可能讓應用程式回復為使用原始模型，讓應用程式的相關 AI 功能仍然可以使用。

使用裝置原則進行執行提供者選取

除了明確選取 EP 之外，您也可以使用裝置原則，這是一種自然、結果導向的方式，可讓您指定 AI 工作負載的執行方式。若要這樣做，您將使用 SessionOptions.SetEpSelectionPolicy 上的 OrtApi函式，傳入 OrtExecutionProviderDevicePolicy 值。您可以使用各種不同的值來自動選取，例如 MAX_PERFORMANCE、 PREFER_NPU、 MAX_EFFICIENCY等等。如需您可以使用的其他值，請參閱 ONNX OrtExecutionProviderDevicePolicy 檔。

// Configure the session to select an EP and device for MAX_EFFICIENCY which typically
// will choose an NPU if available with a CPU fallback.
var sessionOptions = new SessionOptions();
sessionOptions.SetEpSelectionPolicy(ExecutionProviderDevicePolicy.MAX_EFFICIENCY);

// Configure the session to select an EP and device for MAX_EFFICIENCY which typically
// will choose an NPU if available with a CPU fallback.
Ort::SessionOptions sessionOptions;
sessionOptions.SetEpSelectionPolicy(OrtExecutionProviderDevicePolicy_MAX_EFFICIENCY);

# Configure the session to select an EP and device for MAX_EFFICIENCY which typically
# will choose an NPU if available with a CPU fallback.
options = ort.SessionOptions()
options.set_provider_selection_policy(ort.OrtExecutionProviderDevicePolicy.MAX_EFFICIENCY)
assert options.has_providers()

提供有關 Windows ML 的意見反應

我們很想聽到您關於使用 Windows ML 的意見反應！如果您遇到任何問題，請使用 Windows 上的意見反應中樞應用程式回報您的問題。

意見反應應在 開發人員平臺 -> Windows Machine Learning 類別下提交。

另請參閱

Microsoft.Windows.AI.MachineLearning 中的 Windows ML API