Tutorial: Use any ONNX LLM in the AI Dev Gallery

In this tutorial, you'll learn how to run any ONNX-based large language model (LLM) through the AI Dev Gallery—an open-source Windows application (available in the Microsoft Store) that showcases AI-powered samples.

These steps work for any LLM in the ONNX Runtime GenAI format, including:

Models downloaded from Hugging Face
Models converted from other frameworks using the AI Toolkit for Visual Studio Code conversion tool

Step 1: Select an interactive sample in the AI Dev Gallery

Open the AI Dev Gallery app.
Navigate to the Samples tab and choose a Text sample (for example, "Generate Text" or "Chat").
Click the Model Selector button to view available models for that sample.
Select the Custom models tab to bring in your own ONNX LLM.

Step 2: Get or convert an ONNX LLM model

To use a model in the AI Dev Gallery, it must be in the ONNX Runtime GenAI format. You can:

Download a pre-converted model:
- Browse models on Hugging Face ONNX Models, or
- In AI Dev Gallery, go to Add model → Search HuggingFace
Convert your own model:
- Click Open AI Toolkit's Conversion Tool in the model selector, which launches the AI Toolkit extension in Visual Studio Code.
  - If you don't have it installed, search for "AI Toolkit" in the VS Code Extensions Marketplace.
- Use the AI Toolkit for Visual Studio Code to convert a supported model to ONNX Runtime GenAI format.

Currently supported models for conversion:

DeepSeek R1 Distill Qwen 1.5B
Phi 3.5 Mini Instruct
Qwen 2.5-1.5B Instruct
Llama 3.2 1B Instruct

Note

AI Toolkit conversion is in Preview and currently supports only the models listed above.

Step 3: Use the ONNX model in the AI Dev Gallery

Once you have a model in the ONNX Runtime GenAI format, return to the AI Dev Gallery model selector window.
Click Add model → From Disk and provide the path to your ONNX model.

Note

If you used AI Toolkit's conversion tool, the converted model path should follow this format:
c:/{workspace}/{model_project}/history/{workflow}/model/model.onnx
Once added, you can now select your model and use it with the interactive samples.
Optionally, click Show source code in the app to view the code that runs the model.

Supported samples in the AI Dev Gallery

These ONNX LLMs can be used with the following samples in the AI Dev Gallery:

Text

Generate Text
Summarize Text
Chat
Semantic Kernel Chat
Grammar Check
Paraphrase Text
Analyze Text Sentiment
Content Moderation
Custom Parameters
Retrieval Augmented Generation

Smart Controls

Smart TextBox

Code

Generate Code
Explain Code

Next steps

Now that you've tried your ONNX LLM in the AI Dev Gallery, you can bring the same approach into your own app.

How the sample works

The AI Dev Gallery samples use OnnxRuntimeGenAIChatClient (from the ONNX Runtime GenAI SDK) to wrap your ONNX model.
This client plugs into the Microsoft.Extensions.AI abstractions (IChatClient, ChatMessage, etc.), so you can work with prompts and responses in a natural, high-level way.
Inside the factory (OnnxRuntimeGenAIChatClientFactory), the app ensures Windows ML (WinML) execution providers are registered and runs the ONNX model with the best available hardware acceleration (CPU, GPU, or NPU).

Example from the sample:

// Register WinML execution providers (under the hood)
var catalog = Microsoft.Windows.AI.MachineLearning.ExecutionProviderCatalog.GetDefault();
await catalog.EnsureAndRegisterCertifiedAsync();

// Create a chat client for your ONNX model
chatClient = await OnnxRuntimeGenAIChatClientFactory.CreateAsync(
    @"C:\path\to\your\onnx\model",
    new LlmPromptTemplate
    {
        System = "<|system|>\n{{CONTENT}}<|end|>\n",
        User = "<|user|>\n{{CONTENT}}<|end|>\n",
        Assistant = "<|assistant|>\n{{CONTENT}}<|end|>\n",
        Stop = [ "<|system|>", "<|user|>", "<|assistant|>", "<|end|>"]
    });

// Stream responses into your UI
await foreach (var part in chatClient.GetStreamingResponseAsync(messages, null, cts.Token))
{
    OutputTextBlock.Text += part;
}

For more details on integrating ONNX models into Windows applications, see:

Feedback

Was this page helpful?

Last updated on 2025-09-25