Share via

Cant get Ubuntu 24.04 to boot with NVIDIA drivers on NV6ads A10 v5

Peter Hunter 20 Reputation points
2026-03-23T17:48:19.15+00:00

I was trying different GPU VM instances on Azure to test various training situations. No issues with the H100 or A100 systems. However, when I tried changing the size of the VM to the NV6ads_A10_v5, my standard Ubuntu 24.04 LTS kernel wont boot. Gets stuck on "probe NVIDIA device failed with -1" error during the boot. I tried changing the NVIDIA drivers. Tried 550, 570, 580, 580-open. Even tried installing the latest Microsoft GRID drivers in Ubuntu's safe mode. Secure Boot and vTPM are disabled. I disabled the Disable the Nouveau kernel driver, although I'm pretty sure that isn't even installed. Nothing seems to work. Note, I tried with a different Azure A10 system, the NV12ads_A10_v5, and got the same error. So, it seems something to do with the NV series (or A10s).

Azure Virtual Machines
Azure Virtual Machines

An Azure service that is used to provision Windows and Linux virtual machines.

0 comments No comments

Answer accepted by question author
  1. Jilakara Hemalatha 12,100 Reputation points Microsoft External Staff Moderator
    2026-03-23T18:29:47.31+00:00

    Hello Peter,

    Thank you for sharing the details and the error information.

    The issue you’re encountering on the NVads A10 v5 VM with Ubuntu 24.04 is typically related to a driver and kernel compatibility mismatch, which prevents the NVIDIA driver from initializing correctly during boot.

    The GRID drivers redistributed by Azure don't work on most non-NV series VMs like NC, NCv2, NCv3, ND, and NDv2-series VMs but works on NCasT4v3 series.

    For more information on the specific vGPU and driver branch versions, visit the NVIDIA websit

    For NVads A10 v5 VMs, it is important to stay on a supported and up-to-date driver version. NVIDIA driver branches follow a compatibility model where each major version is generally backward compatible only with the immediately previous version (for example, vGPU 17.x supports 16.x, but not older versions). Using unsupported or mismatched driver versions can lead to driver initialization failures.

    Please also note the following:

    • Ensure Secure Boot is disabled, as it can block driver loading
    • GRID drivers may have issues with newer kernels (such as kernel 6.11). In such cases, downgrading to a supported version like 6.8 can help
    • Avoid mixing different driver types (CUDA, open drivers, GRID), and use only the Azure-supported GRID driver for this VM series
    • GRID Driver 17.3 currently supports only NCasT4_v3 series of VMs. To use this driver, download and install GRID Driver 17.3 manually

    Please follow the steps below to reinstall the driver cleanly:

    1. Remove any manually installed NVIDIA drivers
    sudo apt purge "*nvidia*"
    sudo apt autoremove
    sudo reboot
    

    2.Install Azure-supported NVIDIA GRID driver Download the validated driver:

    wget https://download.microsoft.com/download/c5319e92-672e-4067-8d85-ab66a7a64db3/NVIDIA-Linux-x86_64-550.144.06-grid-azure.run
    

    Make it executable and install:

    chmod +x NVIDIA-Linux-x86_64-550.144.06-grid-azure.run
    sudo ./NVIDIA-Linux-x86_64-550.144.06-grid-azure.run --silent
    

    Check the CPU detection and reboot the VM.

    The below document gives clear information and details on how to install the GRID Drivers!

    https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup

    https://forums.developer.nvidia.com/t/installing-nvidia-drivers-cuda-on-azure-nvadsa10-v5-vm-ubuntu-22-04/321128/3

    If the issue persists on Ubuntu 24.04, we recommend testing with Ubuntu 22.04 LTS, which is currently more stable and widely validated for this VM series.

    Hope this helps! Please let me know if you have any queries

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Q&A Assist
    2026-03-23T17:48:34.7233333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    On NVads A10 v5 VMs running Ubuntu 24.04, the issue is typically with the NVIDIA driver/kernel combination rather than Secure Boot or Nouveau. For these SKUs, Azure guidance is:

    1. Prefer the NVIDIA GPU Driver extension instead of manually mixing Ubuntu-packaged drivers and GRID drivers.
      • Use the NvidiaGpuDriverLinux extension, which installs the appropriate CUDA or GRID driver for N‑series VMs.
      • Be aware that on NVads A10 v5, the extension currently targets GRID v17.x by default, and there is a known issue with GRID 17.5 impacting CUDA workloads.
    2. Pin the driver to a supported/working GRID version for A10 instead of the latest:
      • Use the extension with a specific driverVersion setting to force an older, compatible GRID driver (for example, 16.5 / 535.161 as shown in the guidance):
             az vm extension set \
               --resource-group <rg-name> \
               --vm-name <vm-name> \
               --name NvidiaGpuDriverLinux \
               --publisher Microsoft.HpcCompute \
               --settings "{'driverVersion':'535.161'}"
        
      • This avoids the GRID 17.5 bug that can break CUDA-related workloads on A10 and helps keep host/guest driver branches compatible.
    3. If the VM is already failing to boot with a broken driver:
      • Boot into the recovery/safe mode or use the Azure Serial Console.
      • Remove the manually installed NVIDIA packages (Ubuntu 550/570/580 and any manually installed GRID .run driver) so the system can boot with a clean kernel.
      • Once the VM boots normally, apply the NvidiaGpuDriverLinux extension with a pinned, supported driver version as above.
    4. For NVads A10 v5 specifically:
      • Azure recommends using the latest supported vGPU branch for that SKU, but notes that the latest major branch is only backward compatible with the previous major branch (n‑1). Running much older guest drivers against newer host drivers can cause failures.
      • For now, if hitting issues with the newest GRID branch on A10, use the documented workaround of forcing a 16.5/535.161 GRID driver via the extension until the 17.x issues are resolved.

    This approach avoids mixing Ubuntu’s generic NVIDIA packages with Azure’s GRID/vGPU stack and aligns the VM with the supported A10 driver matrix, which should eliminate the boot hang at “probe NVIDIA device failed with -1”.


    References:

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.