Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
In this tutorial, you'll learn how to run any ONNX-based large language model (LLM) through the AI Dev Galleryβan open-source Windows application (available in the Microsoft Store) that showcases AI-powered samples.
These steps work for any LLM in the ONNX Runtime GenAI format, including:
- Models downloaded from Hugging Face
- Models converted from other frameworks using the AI Toolkit for Visual Studio Code conversion tool
Step 1: Select an interactive sample in the AI Dev Gallery
Open the AI Dev Gallery app.
Navigate to the Samples tab and choose a Text sample (for example, "Generate Text" or "Chat").
Click the Model Selector button to view available models for that sample.
Select the Custom models tab to bring in your own ONNX LLM.
Step 2: Get or convert an ONNX LLM model
To use a model in the AI Dev Gallery, it must be in the ONNX Runtime GenAI format. You can:
Download a pre-converted model:
Browse models on Hugging Face ONNX Models, or
In AI Dev Gallery, go to Add model β Search HuggingFace
Convert your own model:
- Click Open AI Toolkit's Conversion Tool in the model selector, which launches the AI Toolkit extension in Visual Studio Code.
- If you don't have it installed, search for "AI Toolkit" in the VS Code Extensions Marketplace.
- Use the AI Toolkit for Visual Studio Code to convert a supported model to ONNX Runtime GenAI format.
- Click Open AI Toolkit's Conversion Tool in the model selector, which launches the AI Toolkit extension in Visual Studio Code.
Currently supported models for conversion:
- DeepSeek R1 Distill Qwen 1.5B
- Phi 3.5 Mini Instruct
- Qwen 2.5-1.5B Instruct
- Llama 3.2 1B Instruct
Note
AI Toolkit conversion is in Preview and currently supports only the models listed above.
Step 3: Use the ONNX model in the AI Dev Gallery
Once you have a model in the ONNX Runtime GenAI format, return to the AI Dev Gallery model selector window.
Click Add model β From Disk and provide the path to your ONNX model.
Note
If you used AI Toolkit's conversion tool, the converted model path should follow this format:
c:/{workspace}/{model_project}/history/{workflow}/model/model.onnxOnce added, you can now select your model and use it with the interactive samples.
Optionally, click Show source code in the app to view the code that runs the model.
Supported samples in the AI Dev Gallery
These ONNX LLMs can be used with the following samples in the AI Dev Gallery:
Text
- Generate Text
- Summarize Text
- Chat
- Semantic Kernel Chat
- Grammar Check
- Paraphrase Text
- Analyze Text Sentiment
- Content Moderation
- Custom Parameters
- Retrieval Augmented Generation
Smart Controls
- Smart TextBox
Code
- Generate Code
- Explain Code
Next steps
Now that you've tried your ONNX LLM in the AI Dev Gallery, you can bring the same approach into your own app.
How the sample works
- The AI Dev Gallery samples use
OnnxRuntimeGenAIChatClient(from the ONNX Runtime GenAI SDK) to wrap your ONNX model. - This client plugs into the
Microsoft.Extensions.AIabstractions (IChatClient,ChatMessage, etc.), so you can work with prompts and responses in a natural, high-level way. - Inside the factory (
OnnxRuntimeGenAIChatClientFactory), the app ensures Windows ML (WinML) execution providers are registered and runs the ONNX model with the best available hardware acceleration (CPU, GPU, or NPU).
Example from the sample:
// Register WinML execution providers (under the hood)
var catalog = Microsoft.Windows.AI.MachineLearning.ExecutionProviderCatalog.GetDefault();
await catalog.EnsureAndRegisterCertifiedAsync();
// Create a chat client for your ONNX model
chatClient = await OnnxRuntimeGenAIChatClientFactory.CreateAsync(
@"C:\path\to\your\onnx\model",
new LlmPromptTemplate
{
System = "<|system|>\n{{CONTENT}}<|end|>\n",
User = "<|user|>\n{{CONTENT}}<|end|>\n",
Assistant = "<|assistant|>\n{{CONTENT}}<|end|>\n",
Stop = [ "<|system|>", "<|user|>", "<|assistant|>", "<|end|>"]
});
// Stream responses into your UI
await foreach (var part in chatClient.GetStreamingResponseAsync(messages, null, cts.Token))
{
OutputTextBlock.Text += part;
}
For more details on integrating ONNX models into Windows applications, see:
See also
- Download AI Dev Gallery
- ONNX models on Hugging Face
- AI Toolkit for Visual Studio Code
- AI Dev Gallery GitHub Repo