Use GPUs for compute-intensive workloads on Azure Kubernetes Service (AKS)
Graphical processing units (GPUs) are often used for compute-intensive workloads such as graphics and visualization workloads. AKS supports the creation of GPU-enabled node pools to run these compute-intensive workloads in Kubernetes. For more information on available GPU-enabled VMs, see GPU optimized VM sizes in Azure. For AKS node pools, we recommend a minimum size of Standard_NC6. Note that the NVv4 series (based on AMD GPUs) are not yet supported with AKS.
GPU-enabled VMs contain specialized hardware subject to higher pricing and region availability. For more information, see the pricing tool and region availability.
Currently, using GPU-enabled node pools is only available for Linux node pools.
Before you begin
This article helps you provision nodes with schedulable GPUs on new and existing AKS clusters. This article assumes that you have an existing AKS cluster. If you need an AKS cluster, see the AKS quickstart using the Azure CLI, using Azure PowerShell, or using the Azure portal.
You also need the Azure CLI version 2.0.64 or later installed and configured. Run
az --version to find the version. If you need to install or upgrade, see Install Azure CLI.
Get the credentials for your cluster
Get the credentials for your AKS cluster using the az aks get-credentials command. The following example command gets the credentials for the myAKSCluster in the myResourceGroup resource group.
az aks get-credentials --resource-group myResourceGroup --name myAKSCluster
Add the NVIDIA device plugin
There are two options for adding the NVIDIA device plugin:
- Use the AKS GPU image
- Manually install the NVIDIA device plugin
You can use either of the above options, but you shouldn't manually install the NVIDIA device plugin daemon set with clusters that use the AKS GPU image.
Update your cluster to use the AKS GPU image (preview)
AKS provides a fully configured AKS image that already contains the NVIDIA device plugin for Kubernetes.
AKS preview features are available on a self-service, opt-in basis. Previews are provided "as is" and "as available," and they're excluded from the service-level agreements and limited warranty. AKS previews are partially covered by customer support on a best-effort basis. As such, these features aren't meant for production use. For more information, see the following support articles:
First, install the aks-preview Azure CLI extension by running the following command:
az extension add --name aks-preview
Run the following command to update to the latest version of the extension released:
az extension update --name aks-preview
Then, register the
GPUDedicatedVHDPreview feature flag by using the az feature register command, as shown in the following example:
az feature register --namespace "Microsoft.ContainerService" --name "GPUDedicatedVHDPreview"
It takes a few minutes for the status to show Registered. Verify the registration status by using the az feature show command:
az feature show --namespace "Microsoft.ContainerService" --name "GPUDedicatedVHDPreview"
When the status reflects Registered, refresh the registration of the Microsoft.ContainerService resource provider by using the az provider register command:
az provider register --namespace Microsoft.ContainerService
Add a node pool for GPU nodes
To add a node pool with to your cluster, use [az aks nodepool add][az-aks-nodepool-add].
az aks nodepool add \ --resource-group myResourceGroup \ --cluster-name myAKSCluster \ --name gpunp \ --node-count 1 \ --node-vm-size Standard_NC6 \ --node-taints sku=gpu:NoSchedule \ --aks-custom-headers UseGPUDedicatedVHD=true \ --enable-cluster-autoscaler \ --min-count 1 \ --max-count 3
The above command adds a node pool named gpunp to the myAKSCluster in the myResourceGroup resource group. The command also sets the VM size for the node in the node pool to Standard_NC6, enables the cluster autoscaler, configures the cluster autoscaler to maintain a minimum of one node and a maximum of three nodes in the node pool, specifies a specialized AKS GPU image nodes on your new node pool, and specifies a sku=gpu:NoSchedule taint for the node pool.
A taint and VM size can only be set for node pools during node pool creation, but the autoscaler settings can be updated at any time.
If your GPU sku requires generation two VMs use --aks-custom-headers UseGPUDedicatedVHD=true,usegen2vm=true. For example:
az aks nodepool add \ --resource-group myResourceGroup \ --cluster-name myAKSCluster \ --name gpunp \ --node-count 1 \ --node-vm-size Standard_NC6 \ --node-taints sku=gpu:NoSchedule \ --aks-custom-headers UseGPUDedicatedVHD=true,usegen2vm=true \ --enable-cluster-autoscaler \ --min-count 1 \ --max-count 3
Manually install the NVIDIA device plugin
Alternatively, you can deploy a DaemonSet for the NVIDIA device plugin. This DaemonSet runs a pod on each node to provide the required drivers for the GPUs.
Add a node pool with to your cluster using [az aks nodepool add][az-aks-nodepool-add].
az aks nodepool add \ --resource-group myResourceGroup \ --cluster-name myAKSCluster \ --name gpunp \ --node-count 1 \ --node-vm-size Standard_NC6 \ --node-taints sku=gpu:NoSchedule \ --enable-cluster-autoscaler \ --min-count 1 \ --max-count 3
The above command adds a node pool named gpunp to the myAKSCluster in the myResourceGroup resource group. The command also sets the VM size for the nodes in the node pool to Standard_NC6, enables the cluster autoscaler, configures the cluster autoscaler to maintain a minimum of one node and a maximum of three nodes in the node pool, and specifies a sku=gpu:NoSchedule taint for the node pool.
A taint and VM size can only be set for node pools during node pool creation, but the autoscaler settings can be updated at any time.
Create a namespace using the kubectl create namespace command, such as gpu-resources:
kubectl create namespace gpu-resources
Create a file named nvidia-device-plugin-ds.yaml and paste the following YAML manifest. This manifest is provided as part of the NVIDIA device plugin for Kubernetes project.
apiVersion: apps/v1 kind: DaemonSet metadata: name: nvidia-device-plugin-daemonset namespace: gpu-resources spec: selector: matchLabels: name: nvidia-device-plugin-ds updateStrategy: type: RollingUpdate template: metadata: # Mark this pod as a critical add-on; when enabled, the critical add-on scheduler # reserves resources for critical add-on pods so that they can be rescheduled after # a failure. This annotation works in tandem with the toleration below. annotations: scheduler.alpha.kubernetes.io/critical-pod: "" labels: name: nvidia-device-plugin-ds spec: tolerations: # Allow this pod to be rescheduled while the node is in "critical add-ons only" mode. # This, along with the annotation above marks this pod as a critical add-on. - key: CriticalAddonsOnly operator: Exists - key: nvidia.com/gpu operator: Exists effect: NoSchedule - key: "sku" operator: "Equal" value: "gpu" effect: "NoSchedule" containers: - image: mcr.microsoft.com/oss/nvidia/k8s-device-plugin:1.11 name: nvidia-device-plugin-ctr securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] volumeMounts: - name: device-plugin mountPath: /var/lib/kubelet/device-plugins volumes: - name: device-plugin hostPath: path: /var/lib/kubelet/device-plugins
Use kubectl apply to create the DaemonSet and confirm the NVIDIA device plugin is created successfully, as shown in the following example output:
$ kubectl apply -f nvidia-device-plugin-ds.yaml daemonset "nvidia-device-plugin" created
Confirm that GPUs are schedulable
With your AKS cluster created, confirm that GPUs are schedulable in Kubernetes. First, list the nodes in your cluster using the kubectl get nodes command:
$ kubectl get nodes NAME STATUS ROLES AGE VERSION aks-gpunp-28993262-0 Ready agent 13m v1.20.7
Now use the kubectl describe node command to confirm that the GPUs are schedulable. Under the Capacity section, the GPU should list as
The following condensed example shows that a GPU is available on the node named aks-nodepool1-18821093-0:
$ kubectl describe node aks-gpunp-28993262-0 Name: aks-gpunp-28993262-0 Roles: agent Labels: accelerator=nvidia [...] Capacity: [...] nvidia.com/gpu: 1 [...]
Run a GPU-enabled workload
To see the GPU in action, schedule a GPU-enabled workload with the appropriate resource request. In this example, let's run a Tensorflow job against the MNIST dataset.
Create a file named samples-tf-mnist-demo.yaml and paste the following YAML manifest. The following job manifest includes a resource limit of
If you receive a version mismatch error when calling into drivers, such as, CUDA driver version is insufficient for CUDA runtime version, review the NVIDIA driver matrix compatibility chart - https://docs.nvidia.com/deploy/cuda-compatibility/index.html
apiVersion: batch/v1 kind: Job metadata: labels: app: samples-tf-mnist-demo name: samples-tf-mnist-demo spec: template: metadata: labels: app: samples-tf-mnist-demo spec: containers: - name: samples-tf-mnist-demo image: mcr.microsoft.com/azuredocs/samples-tf-mnist-demo:gpu args: ["--max_steps", "500"] imagePullPolicy: IfNotPresent resources: limits: nvidia.com/gpu: 1 restartPolicy: OnFailure tolerations: - key: "sku" operator: "Equal" value: "gpu" effect: "NoSchedule"
Use the kubectl apply command to run the job. This command parses the manifest file and creates the defined Kubernetes objects:
kubectl apply -f samples-tf-mnist-demo.yaml
View the status and output of the GPU-enabled workload
Monitor the progress of the job using the kubectl get jobs command with the
--watch argument. It may take a few minutes to first pull the image and process the dataset. When the COMPLETIONS column shows 1/1, the job has successfully finished. Exit the
kubetctl --watch command with Ctrl-C:
$ kubectl get jobs samples-tf-mnist-demo --watch NAME COMPLETIONS DURATION AGE samples-tf-mnist-demo 0/1 3m29s 3m29s samples-tf-mnist-demo 1/1 3m10s 3m36s
To look at the output of the GPU-enabled workload, first get the name of the pod with the kubectl get pods command:
$ kubectl get pods --selector app=samples-tf-mnist-demo NAME READY STATUS RESTARTS AGE samples-tf-mnist-demo-mtd44 0/1 Completed 0 4m39s
Now use the kubectl logs command to view the pod logs. The following example pod logs confirm that the appropriate GPU device has been discovered,
Tesla K80. Provide the name for your own pod:
$ kubectl logs samples-tf-mnist-demo-smnr6 2019-05-16 16:08:31.258328: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-05-16 16:08:31.396846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 2fd7:00:00.0 totalMemory: 11.17GiB freeMemory: 11.10GiB 2019-05-16 16:08:31.396886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 2fd7:00:00.0, compute capability: 3.7) 2019-05-16 16:08:36.076962: I tensorflow/stream_executor/dso_loader.cc:139] successfully opened CUDA library libcupti.so.8.0 locally Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes. Extracting /tmp/tensorflow/input_data/train-images-idx3-ubyte.gz Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes. Extracting /tmp/tensorflow/input_data/train-labels-idx1-ubyte.gz Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes. Extracting /tmp/tensorflow/input_data/t10k-images-idx3-ubyte.gz Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes. Extracting /tmp/tensorflow/input_data/t10k-labels-idx1-ubyte.gz Accuracy at step 0: 0.1081 Accuracy at step 10: 0.7457 Accuracy at step 20: 0.8233 Accuracy at step 30: 0.8644 Accuracy at step 40: 0.8848 Accuracy at step 50: 0.8889 Accuracy at step 60: 0.8898 Accuracy at step 70: 0.8979 Accuracy at step 80: 0.9087 Accuracy at step 90: 0.9099 Adding run metadata for 99 Accuracy at step 100: 0.9125 Accuracy at step 110: 0.9184 Accuracy at step 120: 0.922 Accuracy at step 130: 0.9161 Accuracy at step 140: 0.9219 Accuracy at step 150: 0.9151 Accuracy at step 160: 0.9199 Accuracy at step 170: 0.9305 Accuracy at step 180: 0.9251 Accuracy at step 190: 0.9258 Adding run metadata for 199 Accuracy at step 200: 0.9315 Accuracy at step 210: 0.9361 Accuracy at step 220: 0.9357 Accuracy at step 230: 0.9392 Accuracy at step 240: 0.9387 Accuracy at step 250: 0.9401 Accuracy at step 260: 0.9398 Accuracy at step 270: 0.9407 Accuracy at step 280: 0.9434 Accuracy at step 290: 0.9447 Adding run metadata for 299 Accuracy at step 300: 0.9463 Accuracy at step 310: 0.943 Accuracy at step 320: 0.9439 Accuracy at step 330: 0.943 Accuracy at step 340: 0.9457 Accuracy at step 350: 0.9497 Accuracy at step 360: 0.9481 Accuracy at step 370: 0.9466 Accuracy at step 380: 0.9514 Accuracy at step 390: 0.948 Adding run metadata for 399 Accuracy at step 400: 0.9469 Accuracy at step 410: 0.9489 Accuracy at step 420: 0.9529 Accuracy at step 430: 0.9507 Accuracy at step 440: 0.9504 Accuracy at step 450: 0.951 Accuracy at step 460: 0.9512 Accuracy at step 470: 0.9539 Accuracy at step 480: 0.9533 Accuracy at step 490: 0.9494 Adding run metadata for 499
Use Container Insights to monitor GPU usage
The following metrics are available for Container Insights with AKS to monitor GPU usage.
|Metric name||Metric dimension (tags)||Description|
||Percentage of time over the past sample period (60 seconds) during which GPU was busy/actively processing for a container. Duty cycle is a number between 1 and 100.|
||Each container can specify limits as one or more GPUs. It is not possible to request or limit a fraction of a GPU.|
||Each container can request one or more GPUs. It is not possible to request or limit a fraction of a GPU.|
||Amount of GPU Memory in bytes available to use for a specific container.|
||Amount of GPU Memory in bytes used by a specific container.|
||Number of GPUs in a node that can be used by Kubernetes.|
||Total Number of GPUs in a node.|
Clean up resources
To remove the associated Kubernetes objects created in this article, use the kubectl delete job command as follows:
kubectl delete jobs samples-tf-mnist-demo
To run Apache Spark jobs, see Run Apache Spark jobs on AKS.
For more information about running machine learning (ML) workloads on Kubernetes, see Kubeflow Labs.
For more information on features of the Kubernetes scheduler, see Best practices for advanced scheduler features in AKS.
For information on using Azure Kubernetes Service with Azure Machine Learning, see the following articles:
Submit and view feedback for