Deploy GPU-enabled workloads on a provisioned machine (preview)

This article describes how to deploy GPU-enabled containerized workloads on a provisioned machine for small form factor deployments of Azure Local.

Containerized workloads establishes your container platform by verifying Docker or installing open-source K3s. This article builds on that foundation to enable NVIDIA GPU acceleration for the workloads that you deploy in module 5.

Docker is supported for single-node GPU workloads. If you want a lightweight Kubernetes environment for orchestrated GPU workloads, you can also use the open-source K3s distribution. To compare these options before you choose one, see Container orchestrators.

Important

This feature is currently in PREVIEW. See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.

Prerequisites

Before you begin, make sure that you:

Have a provisioned machine that you can reach over SSH.
Complete the steps in Connect a provisioned machine from the Azure portal.
Complete the steps in Run containerized workloads on a provisioned machine.
Have supported hardware with an NVIDIA GPU installed in the provisioned machine.
Installed NVIDIA GPU drivers on the host OS.
Have a Windows PC on the same local network as the provisioned machine.
Installed and signed into Azure CLI.
Have internet connectivity available to install packages and pull container images.

If you use Docker, make sure that:

Docker is already available on the provisioned machine, as described in Run containerized workloads on a provisioned machine.

If you use K3s, make sure that:

Open-source K3s is installed and running.
kubectl access to the K3s cluster is configured, as described in Run containerized workloads on a provisioned machine.

Choose your approach

Use Docker if you want the fastest way to run a GPU-enabled container on a single device.
Use K3s if you want Kubernetes APIs, kubectl workflows, GPU scheduling, or lightweight orchestration capabilities.

Choose the same container platform that you prepared in Run containerized workloads on a provisioned machine. If you verified Docker, continue with the Docker path in this article. If you installed K3s and configured kubectl, continue with the K3s path.

How GPU-enabled workloads work

GPU-enabled container workloads rely on multiple layers working together correctly.

The following components must be configured:

NVIDIA GPU drivers
NVIDIA kernel modules and device nodes
NVIDIA Container Toolkit
Container runtime configuration
GPU-enabled workload configuration

K3s workloads also require:

NVIDIA Kubernetes device plugin
Kubernetes RuntimeClass configuration

If any layer is missing or misconfigured, GPU workloads might fail to start or might not detect GPU resources correctly.

Validate NVIDIA GPU access on the host

Confirm that the operating system can detect the NVIDIA GPU.

lspci | grep -i nvidia

Example output:

01:00.0 VGA compatible controller: NVIDIA Corporation Device

Validate that the NVIDIA drivers are functioning correctly:

nvidia-smi

Example output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 550.xx.xx                                                        |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
+-----------------------------------------------------------------------------+

Troubleshoot nvidia-smi

If nvidia-smi fails, the NVIDIA kernel modules or device nodes might not be initialized correctly.

Load the required NVIDIA kernel modules:

sudo modprobe nvidia
sudo modprobe nvidia_uvm

Validate that the NVIDIA device nodes exist:
```
ls /dev/nvidia*
```

If the device nodes are missing, create them manually:

sudo mknod -m 666 /dev/nvidia0 c 195 0
sudo mknod -m 666 /dev/nvidiactl c 195 255

Run the command again:
```
nvidia-smi
```

Note

In production environments, NVIDIA device nodes should be managed through proper driver installation and udev rules rather than manual device creation.

Install the NVIDIA Container Toolkit

The NVIDIA Container Toolkit enables containers to access GPU devices from the host system.

Add the NVIDIA repository:

sudo curl -fsSL \
https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo \
-o /etc/yum.repos.d/nvidia-container-toolkit.repo

Update the repository configuration.

Due to a repository metadata signature validation issue on Azure Linux with tdnf, update the NVIDIA repository configuration before refreshing package metadata.
```
sudo sed -i 's|^repo_gpgcheck=1|repo_gpgcheck=0|' \
/etc/yum.repos.d/nvidia-container-toolkit.repo

sudo sed -i 's|^gpgcheck=0|gpgcheck=1|' \
/etc/yum.repos.d/nvidia-container-toolkit.repo
```

Refresh package metadata:

sudo tdnf clean all
sudo tdnf makecache

Install the toolkit:

sudo tdnf install -y nvidia-container-toolkit

Verify the installation:
```
nvidia-ctk --version
```

Docker workloads can access GPUs directly through the NVIDIA container runtime.

Use this path if you followed the Docker workflow in Run containerized workloads on a provisioned machine.

Configure the NVIDIA runtime for Docker:

sudo nvidia-ctk runtime configure --runtime=docker

Restart Docker:
```
sudo systemctl restart docker
```
Note

This article uses the NVIDIA CUDA sample image hosted in the NVIDIA GPU Cloud (NGC) catalog: NVIDIA CUDA Sample Container Image.
Run the sample workload:
```
sudo docker run --rm --gpus all \
  nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubi8
```
Note

If you run Docker commands without sudo, you may see a permission denied error when connecting to /var/run/docker.sock. Use sudo docker ... for the command, or configure Docker access for the current user.

Successful output resembles:
```
[Vector addition of 50000 elements]
Test PASSED
Done
```
The Test PASSED message confirms that:
- Docker successfully accessed the NVIDIA GPU.
- The NVIDIA runtime was configured correctly.
- The container successfully used the GPU.

K3s environments that use containerd must be configured to expose the NVIDIA runtime to containers.

Use this path if you installed K3s and configured cluster access in Run containerized workloads on a provisioned machine.

Restart K3s:
```
sudo systemctl restart k3s
```

Some K3s environments automatically detect the NVIDIA runtime configuration. Check for NVIDIA runtime entries:

sudo grep -i nvidia /var/lib/rancher/k3s/agent/etc/containerd/config.toml

If the command returns entries similar to the following, the NVIDIA runtime is already configured and you can skip the manual containerd configuration steps.

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'nvidia']
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'nvidia'.options]
  BinaryName = "/usr/bin/nvidia-container-runtime"

If no NVIDIA runtime is present, configure it manually.

Create the configuration directory:

sudo mkdir -p \
/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d

Create the runtime configuration file:

sudo nano \
/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d/99-nvidia.toml

Add the following configuration:

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'nvidia']
runtime_type = "io.containerd.runc.v2"

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'nvidia'.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
SystemdCgroup = true

Restart K3s again:
```
sudo systemctl restart k3s
```

Create the NVIDIA RuntimeClass

A Kubernetes RuntimeClass allows workloads to explicitly request the NVIDIA runtime.

Create a file named runtimeclass-nvidia.yaml:

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: nvidia
handler: nvidia

Apply the configuration:
```
kubectl apply -f runtimeclass-nvidia.yaml
```
Note

If the RuntimeClass already exists, you may see a warning that the resource is missing the kubectl.kubernetes.io/last-applied-configuration annotation. This warning is expected when the resource wasn't originally created with kubectl apply. If the output includes runtimeclass.node.k8s.io/nvidia configured, the RuntimeClass was updated successfully.

Install the NVIDIA Kubernetes device plugin

The NVIDIA Kubernetes device plugin exposes GPU resources to Kubernetes as nvidia.com/gpu.

Deploy the device plugin:

kubectl apply -f \
https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.5/nvidia-device-plugin.yml

Verify that the device plugin pod is running:
```
kubectl -n kube-system get pods
```
Wait until the NVIDIA device plugin reaches the Running state.

Generate the NVIDIA CDI configuration

Some containerd environments use the Container Device Interface (CDI) for GPU device injection.

Generate the NVIDIA CDI specification:
```
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
```
Example success output:
```
INFO[0001] Generated CDI spec with version 0.5.0
```
Note

During CDI generation, you might see warnings about optional Vulkan, X11, MPS, or Fabric Manager components not being present. These warnings are expected in lightweight or headless environments and don't necessarily prevent GPU compute workloads from functioning correctly.
Verify that the CDI specification was created:
```
ls -l /etc/cdi/nvidia.yaml
```

Configure the runtime:

sudo nvidia-ctk runtime configure \
--runtime=containerd \
--config=/var/lib/rancher/k3s/agent/etc/containerd/config.toml

Example output:

INFO[0000] Wrote updated config to /etc/containerd/conf.d/99-nvidia.toml

Restart K3s:
```
sudo systemctl restart k3s
```

Verify that K3s detected the NVIDIA runtime configuration:

sudo grep -i nvidia /var/lib/rancher/k3s/agent/etc/containerd/config.toml

Example output:

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'nvidia']
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'nvidia'.options]
  BinaryName = "/usr/bin/nvidia-container-runtime"

Verify GPU visibility in Kubernetes

Verify that the NVIDIA device plugin pod is running:
```
kubectl get pods -n kube-system | grep nvidia
```

Verify that Kubernetes can detect allocatable GPU resources:

kubectl get nodes -o jsonpath='{.items[*].status.allocatable.nvidia\.com/gpu}'

Example output:

If the command returns no output, check the NVIDIA device plugin logs:

kubectl logs -n kube-system daemonset/nvidia-device-plugin-daemonset

If the logs show could not load NVML library: libnvidia-ml.so.1, patch the device plugin DaemonSet to use the NVIDIA RuntimeClass:

kubectl patch daemonset nvidia-device-plugin-daemonset -n kube-system \
  --type='json' \
  -p='[{"op":"add","path":"/spec/template/spec/runtimeClassName","value":"nvidia"}]'

Restart K3s:

sudo systemctl restart k3s

Verify that the device plugin registered successfully:

kubectl logs -n kube-system daemonset/nvidia-device-plugin-daemonset

Expected log output includes:

Detected NVML platform: found NVML library
Registered device plugin for 'nvidia.com/gpu' with Kubelet

You can also inspect the node configuration:
```
kubectl describe node
```
Verify that nvidia.com/gpu appears under:
- Capacity
- Allocatable
Example output:
```
Capacity:
  nvidia.com/gpu: 1

Allocatable:
  nvidia.com/gpu: 1
```

Deploy a sample GPU-enabled K3s workload

Create a file named cuda-vectoradd.yaml:

apiVersion: batch/v1
kind: Job
metadata:
  name: cuda-vectoradd
spec:
  template:
    spec:
      runtimeClassName: nvidia
      restartPolicy: Never
      containers:
      - name: cuda-vectoradd
        image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubi8
        resources:
          limits:
            nvidia.com/gpu: 1
  backoffLimit: 1

Deploy the sample workload:
```
kubectl apply -f cuda-vectoradd.yaml
```
Verify that the job completed successfully:
```
kubectl get jobs
```

View the workload logs:

kubectl logs job/cuda-vectoradd

Successful output resembles:

[Vector addition of 50000 elements]
Test PASSED
Done

The Test PASSED message confirms that:

Kubernetes successfully scheduled the workload to a GPU-enabled node.
The NVIDIA runtime was configured correctly.
The container successfully accessed the GPU.

Clean up the sample workload

Delete the sample workload:

kubectl delete job cuda-vectoradd

Troubleshooting

nvidia-smi fails on the host

Verify that:

NVIDIA drivers are installed.
NVIDIA kernel modules are loaded.
/dev/nvidia0 and /dev/nvidiactl exist.

GPU resources aren't visible in Kubernetes

Verify that:

The NVIDIA device plugin is running.
The NVIDIA runtime exists in the containerd configuration.
K3s was restarted after runtime changes.

Docker containers can't access the GPU

Verify that:

Docker was restarted after runtime configuration.
nvidia-container-toolkit is installed.
The --gpus all flag is specified.

Pods or jobs remain in Pending

This issue usually indicates:

GPU resources are unavailable.
nvidia.com/gpu isn't allocatable.
The NVIDIA runtime isn't configured correctly.
The workload requests more GPUs than are available on the node.

Next steps

Return to Deploy applications to your cluster to choose the workload path that you want to run next.

Feedback

Was this page helpful?

Last updated on 2026-05-28

Deploy GPU-enabled workloads on a provisioned machine (preview)

Prerequisites

Choose your approach

How GPU-enabled workloads work

Validate NVIDIA GPU access on the host

Troubleshoot nvidia-smi

Install the NVIDIA Container Toolkit

Run a GPU-enabled workload

Troubleshooting

nvidia-smi fails on the host

GPU resources aren't visible in Kubernetes

Docker containers can't access the GPU

Pods or jobs remain in Pending

Next steps

Feedback

Additional resources