通过 ONNX Runtime 开始在 WinUI 应用中使用 ONNX 模型

项目
05/21/2024

本文将指导你创建一个 WinUI 3 应用，该应用通过 ONNX 模型对图像中的对象进行分类并显示每个分类的置信度。有关如何在 Windows 应用中使用 AI 和机器学习模型的更多信息，请参阅开始在 Windows 应用中使用 AI 和机器学习模型。

什么是 ONNX 运行时

ONNX Runtime 运行时是一个跨平台的机器学习模型加速器，可通过灵活的接口集成特定于硬件的库。 ONNX Runtime 可以与 PyTorch、Tensorflow/Keras、TFLite、scikit-learn 和其他框架中的模型配合使用。有关详细信息，请参阅 ONNX Runtime 网站 https://onnxruntime.ai/docs/。

此示例使用 DirectML Execution Provider，该提供程序在 Windows 设备上的不同硬件选项之间抽象化并运行，并支持在本地加速器（例如 GPU 和 NPU）之间执行。

先决条件

设备必须启用开发人员模式。有关详细信息，请参阅启用用于开发的设备。
具有 .NET 桌面开发工作负载的 Visual Studio 2022 或更高版本。

创建新的 C# WinUI 应用

在 Visual Studio 中，创建新的项目。在“创建新项目”对话框中，将语言筛选器设置为“C#”，将项目类型筛选器设置为“winui”，然后选择“打包的空白应用（桌面版 WinUI3）”模板。将新项目命名为“ONNXWinUIExample”。

添加对 Nuget 包的引用

在“解决方案资源管理器”中，右键单击“依赖项”并选择“管理 NuGet 包...”。在 NuGet 包管理器中，选择“浏览”选项卡。搜索以下包，对于每个包，在“版本”下拉列表中选择最新的稳定版本，然后单击“安装”。

程序包	说明
Microsoft.ML.OnnxRuntime.DirectML	提供用于在 GPU 上运行 ONNX 模型的 API。
SixLabors.ImageSharp	提供用于处理模型输入图像的图像实用程序。
SharpDX.DXGI	提供用于从 C# 访问 DirectX 设备的 API。

将以下 using 指令添加到 MainWindows.xaml.cs 的顶部，以便从这些库访问 API。

// MainWindow.xaml.cs
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;
using SharpDX.DXGI;
using SixLabors.ImageSharp;
using SixLabors.ImageSharp.Formats;
using SixLabors.ImageSharp.PixelFormats;
using SixLabors.ImageSharp.Processing;

将模型添加到项目

在“解决方案资源管理器”中，右键单击项目并选择“添加 - 新建文件夹”。> 将新文件夹命名为“model”。对于此示例，我们将使用来自 https://github.com/onnx/models 的模型 resnet50-v2-7.onnx。转到 https://github.com/onnx/models/blob/main/validated/vision/classification/resnet/model/resnet50-v2-7.onnx 处的模型的存储库视图。单击“下载原始文件”按钮。将此文件复制到刚刚创建的“model”目录中。

在解决方案资源管理器中，单击模型文件，并将“复制到输出目录”设置为“如果较新则复制”。

创建简单的 UI

对于此示例，我们将创建一个简单的 UI，其中包含一个“按钮”，以允许用户选择要使用模型评估的图像；一个“图像”控件，用于显示所选的图像；以及一个“文本块”，用于列出图像中检测到的对象以及每个对象分类的置信度。

在 MainWindow.xaml 文件中，将默认的 StackPanel 元素替换为如下 XAML 代码。

<!--MainWindow.xaml-->
<Grid Padding="25" >
    <Grid.ColumnDefinitions>
        <ColumnDefinition/>
        <ColumnDefinition/>
        <ColumnDefinition/>
    </Grid.ColumnDefinitions>
    <Button x:Name="myButton" Click="myButton_Click" Grid.Column="0" VerticalAlignment="Top">Select photo</Button>
    <Image x:Name="myImage" MaxWidth="300" Grid.Column="1" VerticalAlignment="Top"/>
    <TextBlock x:Name="featuresTextBlock" Grid.Column="2" VerticalAlignment="Top"/>
</Grid>

初始化模型

在 MainWindow.xaml.cs 文件中的 MainWindow 类内，创建一个名为 InitModel 的帮助程序方法，以初始化模型。此方法利用 SharpDX.DXGI 库中的 API 选择第一个可用的适配器。在此会话中的 DirectML 执行提供程序的 SessionOptions 对象中，设置所选的适配器。最后，初始化新的 InferenceSession，并按照指向模型文件的路径和会话选项传递。

// MainWindow.xaml.cs

private InferenceSession _inferenceSession;
private string modelDir = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "model");

private void InitModel()
{
    if (_inferenceSession != null)
    {
        return;
    }

    // Select a graphics device
    var factory1 = new Factory1();
    int deviceId = 0;

    Adapter1 selectedAdapter = factory1.GetAdapter1(0);

    // Create the inference session
    var sessionOptions = new SessionOptions
    {
        LogSeverityLevel = OrtLoggingLevel.ORT_LOGGING_LEVEL_INFO
    };
    sessionOptions.AppendExecutionProvider_DML(deviceId);
    _inferenceSession = new InferenceSession($@"{modelDir}\resnet50-v2-7.onnx", sessionOptions);

}

加载和分析图像

为简单起见，对于此示例，所有用于加载和格式化图像、调用模型和显示结果的步骤都将放在按钮单击处理程序中。请注意，我们将 async 关键字添加到默认模板中包含的按钮单击处理程序中，以便能够在处理程序中执行异步操作。

// MainWindow.xaml.cs

private async void myButton_Click(object sender, RoutedEventArgs e)
{
    ...
}

使用一个 FileOpenPicker，以允许用户从他们的计算机中选择图像，以便在 UI 中进行分析和显示。

    FileOpenPicker fileOpenPicker = new()
    {
        ViewMode = PickerViewMode.Thumbnail,
        FileTypeFilter = { ".jpg", ".jpeg", ".png", ".gif" },
    };
    InitializeWithWindow.Initialize(fileOpenPicker, WinRT.Interop.WindowNative.GetWindowHandle(this));
    StorageFile file = await fileOpenPicker.PickSingleFileAsync();
    if (file == null)
    {
        return;
    }

    // Display the image in the UI
    var bitmap = new BitmapImage();
    bitmap.SetSource(await file.OpenAsync(Windows.Storage.FileAccessMode.Read));
    myImage.Source = bitmap;

接下来，我们需要处理输入，以使其成为模型支持的格式。 SixLabors.ImageSharp 库用于以 24 位 RGB 格式加载图像，并将图像大小调整为 224x224 像素。随后，规范化像素值，平均值为 255*[0.485, 0.456, 0.406]，标准偏差为 255*[0.229, 0.224, 0.225]。可以在 resnet 模型的 github 页面上找到模型的预期格式的详细信息。

    using var fileStream = await file.OpenStreamForReadAsync();

    IImageFormat format = SixLabors.ImageSharp.Image.DetectFormat(fileStream);
    using Image<Rgb24> image = SixLabors.ImageSharp.Image.Load<Rgb24>(fileStream);


    // Resize image
    using Stream imageStream = new MemoryStream();
    image.Mutate(x =>
    {
        x.Resize(new ResizeOptions
        {
            Size = new SixLabors.ImageSharp.Size(224, 224),
            Mode = ResizeMode.Crop
        });
    });

    image.Save(imageStream, format);

    // Preprocess image
    // We use DenseTensor for multi-dimensional access to populate the image data
    var mean = new[] { 0.485f, 0.456f, 0.406f };
    var stddev = new[] { 0.229f, 0.224f, 0.225f };
    DenseTensor<float> processedImage = new(new[] { 1, 3, 224, 224 });
    image.ProcessPixelRows(accessor =>
    {
        for (int y = 0; y < accessor.Height; y++)
        {
            Span<Rgb24> pixelSpan = accessor.GetRowSpan(y);
            for (int x = 0; x < accessor.Width; x++)
            {
                processedImage[0, 0, y, x] = ((pixelSpan[x].R / 255f) - mean[0]) / stddev[0];
                processedImage[0, 1, y, x] = ((pixelSpan[x].G / 255f) - mean[1]) / stddev[1];
                processedImage[0, 2, y, x] = ((pixelSpan[x].B / 255f) - mean[2]) / stddev[2];
            }
        }
    });

接下来，我们在托管的图像数据数组的顶部创建 Tensor（张量）类型的 OrtValue，以设置输入。

    // Setup inputs
    // Pin tensor buffer and create a OrtValue with native tensor that makes use of
    // DenseTensor buffer directly. This avoids extra data copy within OnnxRuntime.
    // It will be unpinned on ortValue disposal
    using var inputOrtValue = OrtValue.CreateTensorValueFromMemory(OrtMemoryInfo.DefaultInstance,
        processedImage.Buffer, new long[] { 1, 3, 224, 224 });

    var inputs = new Dictionary<string, OrtValue>
    {
        { "data", inputOrtValue }
    };

接下来，如果尚未初始化推理会话，则调用 InitModel 帮助程序方法。然后调用 Run 方法，以运行模型并检索结果。

    // Run inference
    if (_inferenceSession == null)
    {
        InitModel();
    }
    using var runOptions = new RunOptions();
    using IDisposableReadOnlyCollection<OrtValue> results = _inferenceSession.Run(runOptions, inputs, _inferenceSession.OutputNames);

模型会将结果作为本机张量缓冲区输出。如下代码将输出转换为浮点数组。应用 softmax 函数，以使值位于 [0,1] 范围内，总和为 1。

    // Postprocess output
    // We copy results to array only to apply algorithms, otherwise data can be accessed directly
    // from the native buffer via ReadOnlySpan<T> or Span<T>
    var output = results[0].GetTensorDataAsSpan<float>().ToArray();
    float sum = output.Sum(x => (float)Math.Exp(x));
    IEnumerable<float> softmax = output.Select(x => (float)Math.Exp(x) / sum);

输出数组中的每个值的索引将映射到训练模型时使用的标签，此索引处的值是模型对于标签表示输入图像中检测到的对象的置信度。我们选取置信度值最高的 10 个结果。此代码使用我们将在下一步中定义的一些帮助程序对象。

    // Extract top 10
    IEnumerable<Prediction> top10 = softmax.Select((x, i) => new Prediction { Label = LabelMap.Labels[i], Confidence = x })
        .OrderByDescending(x => x.Confidence)
        .Take(10);

    // Print results
    featuresTextBlock.Text = "Top 10 predictions for ResNet50 v2...\n";
    featuresTextBlock.Text += "-------------------------------------\n";
    foreach (var t in top10)
    {
        featuresTextBlock.Text += $"Label: {t.Label}, Confidence: {t.Confidence}\n";
    }
} // End of myButton_Click

声明帮助程序对象

Prediction 类只提供了一种将对象标签与置信度值相关联的简单方法。在 MainPage.xaml.cs 中，在 ONNXWinUIExample 命名空间块之内、MainWindow 类定义之外添加此类。

internal class Prediction
{
    public object Label { get; set; }
    public float Confidence { get; set; }
}

接下来，添加 LabelMap 帮助程序类，此类将按特定顺序列出训练模型时使用的所有对象标签，以使标签映射到模型返回的结果的索引。标签列表太长，无法在此处完整显示。可以从 ONNXRuntime github 存储库中的示例代码文件复制完整的 LabelMap 类，并将其粘贴到 ONNXWinUIExample 命名空间块中。

public class LabelMap
{
    public static readonly string[] Labels = new[] {
        "tench",
        "goldfish",
        "great white shark",
        ...
        "hen-of-the-woods",
        "bolete",
        "ear",
        "toilet paper"};

运行示例

生成并运行该项目。单击“选择照片”按钮，然后选取要分析的图像文件。可以查看 LabelMap 帮助程序类定义，以查看模型可以识别的对象并选取可能具有有趣的结果的图像。在初始化模型之后、首次运行模型时以及完成模型处理之后，应当会看到图像中检测到的对象的列表以及每个预测的置信度值。

Top 10 predictions for ResNet50 v2...
-------------------------------------
Label: lakeshore, Confidence: 0.91674984
Label: seashore, Confidence: 0.033412453
Label: promontory, Confidence: 0.008877817
Label: shoal, Confidence: 0.0046836217
Label: container ship, Confidence: 0.001940886
Label: Lakeland Terrier, Confidence: 0.0016400366
Label: maze, Confidence: 0.0012478716
Label: breakwater, Confidence: 0.0012336193
Label: ocean liner, Confidence: 0.0011933135
Label: pier, Confidence: 0.0011284945

通过