Edit

Accelerate AI models with Windows ML

Windows ML accelerates inference across NPUs, GPUs, and CPUs by pairing the ONNX Runtime with hardware-tuned execution providers (EPs). To learn more about execution providers, see the ONNX Runtime docs.

Note

You're still responsible for optimizing your models for different hardware. Windows ML handles execution provider distribution, not model optimization. See AI Toolkit and the ONNX Runtime Tutorials for more info on optimization.

What is an execution provider?

An execution provider (EP) is a component that enables hardware-specific optimizations for machine learning (ML) operations. Execution providers abstract different compute backends (NPU, GPU, and CPU) and provide a unified interface for graph partitioning, kernel registration, and operator execution. To learn more, see the ONNX Runtime docs.

Two ways to get EPs

Windows ML EPs: Use the ExecutionProviderCatalog APIs to acquire Windows-certified EPs that go through a rigorous certification and regression testing process, and are automatically updated. See Windows ML EPs to learn more.

Bring your own: Obtain and reference EP binaries yourself, enabling support for offline environments, managed devices, or strict version-pinning requirements. See Bring your own EPs to learn more.

See Windows ML EPs vs. bring-your-own for tradeoffs.

Silicon-to-EP mapping

Silicon Execution providers Typical use case
NPU OpenVINO (Intel)
QNN (Qualcomm)
VitisAI (AMD)
Battery-efficient, sustained on-device inference on Copilot+ PCs
GPU MIGraphX (AMD)
NvTensorRtRtx (NVIDIA)
OpenVINO (Intel)
QNN (Qualcomm)
DirectML (included - legacy)
High-throughput image/video/GenAI workloads
CPU OpenVINO (Intel)
ORT CPU EP (included)
Universal fallback; low-latency for small models

See also