Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
GPU acceleration for AKS Edge Essentials is currently in PREVIEW. See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.
GPUs are a popular choice for artificial intelligence computations, because they offer parallel processing capabilities and can often execute vision-based inferencing faster than CPUs. To better support artificial intelligence and machine learning applications, AKS Edge Essentials can expose a GPU to the virtual machine's Linux module.
AKS Edge Essentials supports GPU-Paravirtualization (GPU-PV) as the GPU passthrough technology. In other words, the GPU is shared between the Linux virtual machine and the host.
Important
These features include components developed and owned by NVIDIA Corporation or its licensors. By using GPU acceleration features, you are accepting and agreeing to the terms of the NVIDIA End-User License Agreement.
Prerequisites
The GPU acceleration of AKS Edge Essentials currently supports a specific set of GPU hardware. Additionally, use of this feature requires specific versions of Windows.
The supported GPUs and required Windows versions are as follows:
Supported GPUs | GPU passthrough type | Supported Windows versions |
---|---|---|
NVIDIA GeForce, Quadro, RTX | GPU-PV | Windows 10/11 (Pro, Enterprise, IoT Enterprise) |
Important
GPU-PV support might be limited to certain generations of processors or GPU architectures, as determined by the GPU vendor. For more information, see the NVIDIA CUDA for WSL documentation.
Windows 10 users must use the November 2021 update build 19044.1620, or later. After installation, you can verify your build version by running winver
at the command prompt.
GPU passthrough is not supported with nested virtualization, such as when you run AKS Edge Essentials in a Windows virtual machine.
System setup and installation
The following sections contain setup and installation information.
- For NVIDIA GeForce/Quadro/RTX GPUs, download and install the NVIDIA CUDA-enabled driver for Windows Subsystem for Linux (WSL) to use with your existing CUDA ML workflows. Originally developed for WSL, the CUDA for WSL drivers is also used for AKS Edge Essentials.
- Windows 10 users must also install WSL because some of the libraries are shared between WSL and AKS Edge Essentials.
- Install or upgrade AKS Edge Essentials to the May 2024 release, or later. For more information, see Update your AKS Edge Essentials clusters. The GPU-PV is supported on both k8s and k3s Kubernetes distributions.
Enable GPU-PV in your AKS Edge Essentials deployment
Step 1: single machine configuration parameters
You can generate the parameters you need to create a single machine cluster and add the necessary GPU-PV configuration parameters using the following commands.
This script only focuses on the GPU-PV configuration; you should also make other necessary general updates according to your own AKS Edge Essentials deployment:
$jsonObj = New-AksEdgeConfig -DeploymentType SingleMachineCluster
$jsonObj.User.AcceptGpuWarning = $true
$machine = $jsonObj.Machines[0]
$machine.LinuxNode.GpuPassthrough.Name = "NVIDIA GeForce GTX 1070"
$machine.LinuxNode.GpuPassthrough.Type = "ParaVirtualization"
$machine.LinuxNode.GpuPassthrough.Count = 1
The key parameters to enable GPU-PV are:
User.AcceptGpuWarning
: Set this parameter totrue
to automatically accept the confirmation message when you enable GPU-PV on AKS Edge Essentials.LinuxNode.GpuPassthrough.Name
: Describes the GPU model that's deployed in this machine, with proper drivers installed.LinuxNode.GpuPassthrough.Type
: Describes the GPU passthrough type. OnlyParaVirtualization
is currently supported.LinuxNode.GpuPassthrough.Count
: Describes how many GPUs are installed on this machine.
Step 2: create a single machine cluster
You can now run the
New-AksEdgeDeployment
cmdlet to deploy a single-machine AKS Edge cluster with a single Linux control plane node. You can use the JSON object generated in step 1 and pass it as a string:New-AksEdgeDeployment -JsonConfigString (New-AksEdgeConfig | ConvertTo-Json -Depth 4)
After successful deployment, verify GPU-PV is enabled by running
nvidia-smi
:
Step 3: Deploy Nvidia runtimeclass
Create a YAML file named nvidia-runtimeclass.yaml with the following content:
apiVersion: node.k8s.io/v1 kind: RuntimeClass metadata: name: nvidia handler: nvidia
Deploy the
runtimeclass
:kubectl apply –f nvidia-runtimeclass.yaml
Step 4: Install Nvidia GPU device plugin
Download nvidia-deviceplugin.yaml from this GitHub location.
Update the container images location in the nvidia-deviceplugin.yaml file to the new value, as follows:
containers: - image: registry.gitlab.com/nvidia/kubernetes/device-plugin/staging/k8s-device-plugin:6a31a868
Install the Nvidia GPU DevicePlugin:
kubectl apply –f nvidia-deviceplugin.yaml
Verify that the plugin is running and the NVIDIA GPU is detected by checking the logs of the deviceplugin pod using the
kubectl get pods -A
andkubectl logs $podname -n kube-system
commands:
Get started with a sample workload
Prepare a workload YAML file named gpu-workload.yaml:
apiVersion: v1 kind: Pod metadata: name: gpu-pod spec: restartPolicy: Never containers: - name: cuda-container image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2 resources: limits: nvidia.com/gpu: 1 # requesting 1 GPU tolerations: - key: nvidia.com/gpu operator: Exists effect: NoSchedule
Run the sample workload:
kubectl apply -f .\gpu-workload.yaml
Verify that the workload ran successfully: