Hi,
I have the same problem.
When create node pool with UseGPUDedicatedVHD=true, the image will be "AKSUbuntu-1804gen2gpucontainerd-202308.10.0", and can not change to Ubuntu-2204 version.
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Hi,
We are creating GPU nodepool in AKS using the first approach from this link
https://learn.microsoft.com/en-us/azure/aks/gpu-cluster
az aks nodepool add \
--resource-group my-scus-rg-app \
--cluster-name k8s-dev-aks \
--name gpunp2 \
--node-count 1 \
--node-vm-size Standard_NC6s_v3 \
--node-taints sku=gpu:NoSchedule \
--aks-custom-headers UseGPUDedicatedVHD=true \
--labels algo=qiefp-linux-gpu-nc6s-v3 \
--enable-cluster-autoscaler \
--min-count 0 \
--max-count 1
This will create linux GPU nide with OS image version: AKSUbuntu-1804gen2gpucontainerd-202304.10.0
However we want to use latest Ubuntu version Ubuntu-2204
How to create GPU node pool with latest Ubuntu version and with UseGPUDedicatedVHD=true preview image to install NVIDIA driver.
Hi,
I have the same problem.
When create node pool with UseGPUDedicatedVHD=true, the image will be "AKSUbuntu-1804gen2gpucontainerd-202308.10.0", and can not change to Ubuntu-2204 version.
Hi,
Yes. We already using this one to create GPU nodepool with latest Ubuntu, we also need NVIDIA driver, we used the Manually install Nvidia device plugin
https://learn.microsoft.com/en-us/azure/aks/gpu-cluster
But this approach is making node to come up more than 5mins, And we frequenty get this error
Error: <class 'cupy_backends.cuda.api.runtime.CUDARuntimeError'>
CUDARuntimeError('cudaErrorNoDevice: no CUDA-capable device is detected')
Using Preview is good and fast.
Any other approach to create Preview image with latest ubuntu
The NVIDIA driver installation is typically handled by the NVIDIA GPU Operator or other custom configurations.
To create a GPU node pool with the latest Ubuntu version without the UseGPUDedicatedVHD=true preview image, you can use the following command:
az aks nodepool add \
--resource-group <resource-group-name> \
--cluster-name <cluster-name> \
--name gpunp2 \
--node-count 1 \
--node-vm-size Standard_NC6s_v3 \
--node-taints sku=gpu:NoSchedule \
--labels algo=qiefp-linux-gpu-nc6s-v3 \
--enable-cluster-autoscaler \
--min-count 0 \
--max-count 1 \
--os-disk-type Ephemeral \
--os-type Linux
Replace <resource-group-name> with the name of your resource group and <cluster-name> with the name of your AKS cluster.
Again, I apologize for the confusion caused. This command will create a GPU node pool with the latest Ubuntu version using the --os-disk-type Ephemeral option.
Hi,
Thank you for the suggestion, when i tried to run this command, get the below error.
unrecognized arguments: --image-reference publisher=Canonical,offer=0001-com-ubuntu-server-focal,sku=22_04-lts-gen2,p3=VHD
My az cli version is 2.49 latest version.
if you want to create a GPU node pool in AKS with the latest Ubuntu version (Ubuntu-2204) and the preview image with UseGPUDedicatedVHD=true to install the NVIDIA driver, you can follow these steps:
First:
Make sure you have the latest Azure CLI version installed.
Second:
Run the following command to create the GPU node pool:
az aks nodepool add
--resource-group <resource-group-name>
--cluster-name <cluster-name>
--name gpunp2
--node-count 1
--node-vm-size Standard_NC6s_v3
--node-taints sku=gpu:NoSchedule
--labels algo=qiefp-linux-gpu-nc6s-v3
--enable-cluster-autoscaler
--min-count 0
--max-count 1
--os-type Linux
--aks-custom-headers UseGPUDedicatedVHD=true
--image-reference publisher=Canonical,offer=0001-com-ubuntu-server-focal,sku=20_04-lts-gen2,p3=VHD
after that , Make sure to replace <resource-group-name> with the name of your resource group and <cluster-name> with the name of your AKS cluster.
Explanation of the command:
--os-type Linux: Specifies that the OS type for the node pool is Linux. --aks-custom-headers UseGPUDedicatedVHD=true: Uses the preview image with UseGPUDedicatedVHD=true to install the NVIDIA driver. --image-reference publisher=Canonical,offer=0001-com-ubuntu-server-focal,sku=20_04-lts-gen2,p3=VHD: Specifies the image reference for the Ubuntu-2204 image. This reference corresponds to the latest Ubuntu version with the GPU-specific VHD image.