Edit

Share via


Tutorial: Use any ONNX LLM in the AI Dev Gallery

In this tutorial, you'll learn how to run any ONNX-based large language model (LLM) through the AI Dev Galleryβ€”an open-source Windows application (available in the Microsoft Store) that showcases AI-powered samples.

These steps work for any LLM in the ONNX Runtime GenAI format, including:


  1. Open the AI Dev Gallery app.

  2. Navigate to the Samples tab and choose a Text sample (for example, "Generate Text" or "Chat").

    Navigate to the Samples tab and choose a Text sample

  3. Click the Model Selector button to view available models for that sample.

    Click the Model Selector button

  4. Select the Custom models tab to bring in your own ONNX LLM.

    Select the Custom models tab


Step 2: Get or convert an ONNX LLM model

To use a model in the AI Dev Gallery, it must be in the ONNX Runtime GenAI format. You can:

  • Download a pre-converted model:

  • Convert your own model:

    • Click Open AI Toolkit's Conversion Tool in the model selector, which launches the AI Toolkit extension in Visual Studio Code.
      • If you don't have it installed, search for "AI Toolkit" in the VS Code Extensions Marketplace.
    • Use the AI Toolkit for Visual Studio Code to convert a supported model to ONNX Runtime GenAI format.

Currently supported models for conversion:

  • DeepSeek R1 Distill Qwen 1.5B
  • Phi 3.5 Mini Instruct
  • Qwen 2.5-1.5B Instruct
  • Llama 3.2 1B Instruct

Note

AI Toolkit conversion is in Preview and currently supports only the models listed above.


  1. Once you have a model in the ONNX Runtime GenAI format, return to the AI Dev Gallery model selector window.

  2. Click Add model β†’ From Disk and provide the path to your ONNX model.

    Add model from disk in AI Dev Gallery

    Note

    If you used AI Toolkit's conversion tool, the converted model path should follow this format:
    c:/{workspace}/{model_project}/history/{workflow}/model/model.onnx

  3. Once added, you can now select your model and use it with the interactive samples.

    Custom model selected and ready to use in AI Dev Gallery

  4. Optionally, click Show source code in the app to view the code that runs the model.

    Show source code for the selected ONNX model sample in AI Dev Gallery


These ONNX LLMs can be used with the following samples in the AI Dev Gallery:

Text

  • Generate Text
  • Summarize Text
  • Chat
  • Semantic Kernel Chat
  • Grammar Check
  • Paraphrase Text
  • Analyze Text Sentiment
  • Content Moderation
  • Custom Parameters
  • Retrieval Augmented Generation

Smart Controls

  • Smart TextBox

Code

  • Generate Code
  • Explain Code

Next steps

Now that you've tried your ONNX LLM in the AI Dev Gallery, you can bring the same approach into your own app.

How the sample works

  • The AI Dev Gallery samples use OnnxRuntimeGenAIChatClient (from the ONNX Runtime GenAI SDK) to wrap your ONNX model.
  • This client plugs into the Microsoft.Extensions.AI abstractions (IChatClient, ChatMessage, etc.), so you can work with prompts and responses in a natural, high-level way.
  • Inside the factory (OnnxRuntimeGenAIChatClientFactory), the app ensures Windows ML (WinML) execution providers are registered and runs the ONNX model with the best available hardware acceleration (CPU, GPU, or NPU).

Example from the sample:

// Register WinML execution providers (under the hood)
var catalog = Microsoft.Windows.AI.MachineLearning.ExecutionProviderCatalog.GetDefault();
await catalog.EnsureAndRegisterCertifiedAsync();

// Create a chat client for your ONNX model
chatClient = await OnnxRuntimeGenAIChatClientFactory.CreateAsync(
    @"C:\path\to\your\onnx\model",
    new LlmPromptTemplate
    {
        System = "<|system|>\n{{CONTENT}}<|end|>\n",
        User = "<|user|>\n{{CONTENT}}<|end|>\n",
        Assistant = "<|assistant|>\n{{CONTENT}}<|end|>\n",
        Stop = [ "<|system|>", "<|user|>", "<|assistant|>", "<|end|>"]
    });

// Stream responses into your UI
await foreach (var part in chatClient.GetStreamingResponseAsync(messages, null, cts.Token))
{
    OutputTextBlock.Text += part;
}

For more details on integrating ONNX models into Windows applications, see:


See also