Create a fully managed GPU node pool on Azure Kubernetes Service (AKS) (preview)

When you run GPU workloads in Azure Kubernetes Service (AKS), you need to install and maintain several software components, including the GPU driver, Kubernetes device plugin, and GPU metrics exporter for telemetry. These components are essential for enabling GPU scheduling, container-level GPU access, observability of resource usage, and proper functioning of AKS GPU-enabled nodes. Previously, cluster operators had to either install these components manually or use open-source alternatives like the NVIDIA GPU Operator, which can introduce complexity and operational overhead.

AKS now supports fully managed GPU nodes (preview) and installs the NVIDIA GPU driver, device plugin, and Data Center GPU Manager (DCGM) metrics exporter by default. This feature enables one-step GPU node pool creation and makes the availability of GPU resources in AKS as simple as general purpose CPU nodes.

In this article, you learn how to provision a fully managed GPU node pool (preview) in your AKS cluster, including default installation of the NVIDIA GPU driver, device plugin, and metrics exporter.

Important

AKS preview features are available on a self-service, opt-in basis. Previews are provided "as is" and "as available," and they're excluded from the service-level agreements and limited warranty. AKS previews are partially covered by customer support on a best-effort basis. As such, these features aren't meant for production use. For more information, see the following support articles:

Before you begin

This article assumes you have an existing AKS cluster. If you don't have a cluster, create one using the Azure CLI, Azure PowerShell, or the Azure portal.
You need the Azure CLI version 2.72.2 or later installed. To find the version, run az --version. If you need to install or upgrade, see Install Azure CLI.
You need to install and upgrade to latest version of the aks-preview extension.
You need to register the ManagedGPUExperiencePreview feature flag in your subscription.

Limitations

This feature currently supports NVIDIA GPU-enabled virtual machine (VM) sizes only.
Updating a general-purpose node pool to add a GPU VM size isn't supported on AKS.
Windows node pools are not supported with this feature, because GPU metrics are not supported. When creating Windows GPU node pools, AKS automatically installs and manages the drivers and Directx device plugin. See AKS Windows GPU documentation for more information.
Migrating your existing multi-instance GPU node pools to use this feature isn't supported.
In-place upgrades to use this feature on existing GPU-enabled nodes isn't supported.

Note

GPU-enabled VMs contain specialized hardware subject to higher pricing and region availability. For more information, see the pricing tool and region availability.

Install the `aks-preview` CLI extension

Install the aks-preview CLI extension using the az extension add command.
```
az extension add --name aks-preview
```
Update the extension to ensure you have the latest version installed using the az extension update command.
```
az extension update --name aks-preview
```

Register the `ManagedGPUExperiencePreview` feature flag in your subscription

Register the ManagedGPUExperiencePreview feature flag in your subscription using the az feature register command.
```
az feature register --namespace Microsoft.ContainerService --name ManagedGPUExperiencePreview
```

Get the credentials for your cluster

Get the credentials for your AKS cluster using the az aks get-credentials command.
```
az aks get-credentials --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME
```

Create an AKS-managed GPU node pool (preview)

You can add a fully managed GPU node pool (preview) to an existing AKS cluster by specifying OS SKU and --tags EnableManagedGPUExperience=true command. When you do this, AKS will install the GPU driver, GPU device plugin, and metrics exporter automatically.

Ubuntu Linux node pool (default SKU)
Azure Linux node pool

To use the default Ubuntu operating system (OS) SKU, you create the node pool without specifying an OS SKU. The node pool is configured for the default operating system based on the Kubernetes version of the cluster.

Add a node pool to your cluster using the az aks nodepool add command with the --tags EnableManagedGPUExperience=true command.

az aks nodepool add \
    --resource‐group MyResourceGroup \
    --cluster‐name MyAKSCluster \
    --name gpunp \
    --node‐count 1 \
    --node‐vm‐size Standard_NC6s_v3 \
    --node‐taints sku=gpu:NoSchedule \
    --enable‐cluster‐autoscaler \
    --min‐count 1 \
    --max‐count 3 \
    --tags EnableManagedGPUExperience=true

Confirm that the managed NVIDIA GPU software components are installed successfully:

az aks nodepool show \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name gpunp \

Your output should include the following values:

...
...
"gpuInstanceProfile": …
    "gpuProfile": {
        "driver": "Install"
    },
...
...

To use Azure Linux, you specify the operating system (OS) SKU by setting os-sku to AzureLinux during node pool creation. The os-type is set to Linux by default.

Add a node pool to your cluster using the az aks nodepool add command with the --os-sku flag set to AzureLinux and --tags EnableManagedGPUExperience=true.

az aks nodepool add \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name gpunp \
    --node-count 1 \
    --os-sku AzureLinux \
    --node-vm-size Standard_NC6s_v3 \
    --node-taints sku=gpu:NoSchedule \
    --enable-cluster-autoscaler \
    --min-count 1 \
    --max-count 3
    --tags EnableManagedGPUExperience=true

Confirm that the managed NVIDIA GPU software components are installed successfully:

az aks nodepool show \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name gpunp \

Your output should include the following values:

...
...
"gpuInstanceProfile": …
    "gpuProfile": {
        "driver": "Install"
    },
...
...

Migrate existing GPU workloads to an AKS-managed GPU node pool

In-place upgrades from a standard NVIDIA GPU node pool to a fully managed NVIDIA GPU node pool (preview) on your AKS cluster isn't supported. We recommend cordoning and draining your existing GPU nodes, then redeploying your workloads to a new GPU-enabled node pool with this feature enabled. See Resize node pools on AKS to learn more.

Bring your own (BYO) GPU driver

If you want to control the installation of the NVIDIA drivers or use the NVIDIA GPU Operator, you can bypass the GPU driver installation during node pool creation. In this case, Microsoft doesn't support or manage the maintenance and compatibility of the NVIDIA drivers as part of the node image deployment. See Skip GPU driver installation for NVIDIA GPU-enabled nodes on AKS to learn more.

Next steps

Deploy a sample GPU workload on your AKS-managed GPU-enabled nodes.
Learn about GPU utilization and performance metrics from managed NVIDIA DCGM exporter on your GPU node pool.

Learn about GPU health monitoring with Node Problem Detector (NPD) on AKS.
Run distributed inference on multiple AKS GPU nodes.

Feedback

War dës Säit hëllefräich?

Last updated on 2025-11-07