Edit

Deploy GPU-enabled workloads on a provisioned machine (preview)

This article describes how to deploy GPU-enabled containerized workloads on a provisioned machine for small form factor deployments of Azure Local.

Containerized workloads establishes your container platform by verifying Docker or installing open-source K3s. This article builds on that foundation to enable NVIDIA GPU acceleration for the workloads that you deploy in module 5.

Docker is supported for single-node GPU workloads. If you want a lightweight Kubernetes environment for orchestrated GPU workloads, you can also use the open-source K3s distribution. To compare these options before you choose one, see Container orchestrators.

Important

This feature is currently in PREVIEW. See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.

Prerequisites

Before you begin, make sure that you:

  • Have a provisioned machine that you can reach over SSH.
  • Complete the steps in Connect a provisioned machine from the Azure portal.
  • Complete the steps in Run containerized workloads on a provisioned machine.
  • Have supported hardware with an NVIDIA GPU installed in the provisioned machine.
  • Installed NVIDIA GPU drivers on the host OS.
  • Have a Windows PC on the same local network as the provisioned machine.
  • Installed and signed into Azure CLI.
  • Have internet connectivity available to install packages and pull container images.

If you use Docker, make sure that:

If you use K3s, make sure that:

Choose your approach

  • Use Docker if you want the fastest way to run a GPU-enabled container on a single device.
  • Use K3s if you want Kubernetes APIs, kubectl workflows, GPU scheduling, or lightweight orchestration capabilities.

Choose the same container platform that you prepared in Run containerized workloads on a provisioned machine. If you verified Docker, continue with the Docker path in this article. If you installed K3s and configured kubectl, continue with the K3s path.

How GPU-enabled workloads work

GPU-enabled container workloads rely on multiple layers working together correctly.

The following components must be configured:

  1. NVIDIA GPU drivers
  2. NVIDIA kernel modules and device nodes
  3. NVIDIA Container Toolkit
  4. Container runtime configuration
  5. GPU-enabled workload configuration

K3s workloads also require:

  • NVIDIA Kubernetes device plugin
  • Kubernetes RuntimeClass configuration

If any layer is missing or misconfigured, GPU workloads might fail to start or might not detect GPU resources correctly.

Validate NVIDIA GPU access on the host

  1. Confirm that the operating system can detect the NVIDIA GPU.

    lspci | grep -i nvidia
    

    Example output:

    01:00.0 VGA compatible controller: NVIDIA Corporation Device
    
  2. Validate that the NVIDIA drivers are functioning correctly:

    nvidia-smi
    

    Example output:

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 550.xx.xx                                                        |
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    +-----------------------------------------------------------------------------+
    

Troubleshoot nvidia-smi

If nvidia-smi fails, the NVIDIA kernel modules or device nodes might not be initialized correctly.

  1. Load the required NVIDIA kernel modules:

    sudo modprobe nvidia
    sudo modprobe nvidia_uvm
    
  2. Validate that the NVIDIA device nodes exist:

    ls /dev/nvidia*
    
  3. If the device nodes are missing, create them manually:

    sudo mknod -m 666 /dev/nvidia0 c 195 0
    sudo mknod -m 666 /dev/nvidiactl c 195 255
    
  4. Run the command again:

    nvidia-smi
    

Note

In production environments, NVIDIA device nodes should be managed through proper driver installation and udev rules rather than manual device creation.

Install the NVIDIA Container Toolkit

The NVIDIA Container Toolkit enables containers to access GPU devices from the host system.

  1. Add the NVIDIA repository:

    sudo curl -fsSL \
    https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo \
    -o /etc/yum.repos.d/nvidia-container-toolkit.repo
    
  2. Update the repository configuration.

    Due to a repository metadata signature validation issue on Azure Linux with tdnf, update the NVIDIA repository configuration before refreshing package metadata.

    sudo sed -i 's|^repo_gpgcheck=1|repo_gpgcheck=0|' \
    /etc/yum.repos.d/nvidia-container-toolkit.repo
    
    sudo sed -i 's|^gpgcheck=0|gpgcheck=1|' \
    /etc/yum.repos.d/nvidia-container-toolkit.repo
    
  3. Refresh package metadata:

    sudo tdnf clean all
    sudo tdnf makecache
    
  4. Install the toolkit:

    sudo tdnf install -y nvidia-container-toolkit
    
  5. Verify the installation:

    nvidia-ctk --version
    

Run a GPU-enabled workload

Docker workloads can access GPUs directly through the NVIDIA container runtime.

Use this path if you followed the Docker workflow in Run containerized workloads on a provisioned machine.

  1. Configure the NVIDIA runtime for Docker:

    sudo nvidia-ctk runtime configure --runtime=docker
    
  2. Restart Docker:

    sudo systemctl restart docker
    

    Note

    This article uses the NVIDIA CUDA sample image hosted in the NVIDIA GPU Cloud (NGC) catalog: NVIDIA CUDA Sample Container Image.

  3. Run the sample workload:

    sudo docker run --rm --gpus all \
      nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubi8
    

    Note

    If you run Docker commands without sudo, you may see a permission denied error when connecting to /var/run/docker.sock. Use sudo docker ... for the command, or configure Docker access for the current user.

    Successful output resembles:

    [Vector addition of 50000 elements]
    Test PASSED
    Done
    

    The Test PASSED message confirms that:

    • Docker successfully accessed the NVIDIA GPU.
    • The NVIDIA runtime was configured correctly.
    • The container successfully used the GPU.

Troubleshooting

nvidia-smi fails on the host

Verify that:

  • NVIDIA drivers are installed.
  • NVIDIA kernel modules are loaded.
  • /dev/nvidia0 and /dev/nvidiactl exist.

GPU resources aren't visible in Kubernetes

Verify that:

  • The NVIDIA device plugin is running.
  • The NVIDIA runtime exists in the containerd configuration.
  • K3s was restarted after runtime changes.

Docker containers can't access the GPU

Verify that:

  • Docker was restarted after runtime configuration.
  • nvidia-container-toolkit is installed.
  • The --gpus all flag is specified.

Pods or jobs remain in Pending

This issue usually indicates:

  • GPU resources are unavailable.
  • nvidia.com/gpu isn't allocatable.
  • The NVIDIA runtime isn't configured correctly.
  • The workload requests more GPUs than are available on the node.

Next steps