NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Tianchen Guo 5 Reputation points
2025-03-20T06:04:07.3066667+00:00

Hi Team,

I suddenly encountered this issue last night and I have tried several solutions online but no one worked, including these 2 threads:

 https://learn.microsoft.com/en-us/answers/questions/1328794/nvidia-smi-has-failed-because-it-couldnt-communica

https://learn.microsoft.com/en-us/answers/questions/1669319/nvidia-smi-has-failed-because-it-couldnt-communica

Issue: ~$ nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

  1. uninstall all previous drivers -> follow https://learn.microsoft.com/en-us/azure/azure-resource-manager/templates/deployment-history?tabs=azure-portal#deployment-operations-and-error-message -> does not work.
  2. uninstall all previous drivers -> follow https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup?wt.mc_id=searchAPI_azureportal_inproduct_rmskilling&sessionId=7ec697615f424d5fb668b3e866efa4e5#nvidia-grid-drivers and https://docs.nvidia.com/vgpu/4.9/grid-vgpu-user-guide/index.html#installing-vgpu-drivers-linux -> download https://download.microsoft.com/download/c/3/4/c3484f19-fe76-4495-a65d-a5222ead9517/NVIDIA-Linux-x86_64-550.144.03-grid-azure.run (I also tried https://download.microsoft.com/download/7/e/c/7ec792c9-3654-4f78-b1a0-41a48e10ca6d/NVIDIA-Linux-x86_64-550.127.05-grid-azure.run)-> install but it needs a secure boot (I am not sure how to use it in cloud VM since we can not type F2) -> finish installation but it still does not work.
  3. uninstall all previous drivers -> using ubuntu-drivers devices to search -> sudo apt install the recommended version -> still does not work.

My VM Environment:

~$ lspci | grep -i nvidia
0001:00:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)

~$ uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.5 LTS"
PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
~$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

~$ ubuntu-drivers devices
ERROR:root:aplay command not found
== /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/VMBUS:00/47505500-0001-0000-3130-444531454238/pci0001:00/0001:00:00.0 ==
modalias : pci:v000010DEd00001EB8sv000010DEsd000012A2bc03sc02i00
vendor   : NVIDIA Corporation
model    : TU104GL [Tesla T4]
driver   : nvidia-driver-470-server - distro non-free
driver   : nvidia-driver-418-server - distro non-free
driver   : nvidia-driver-555 - third-party non-free
driver   : nvidia-driver-560 - third-party non-free
driver   : nvidia-driver-565 - third-party non-free recommended
driver   : nvidia-driver-570-server - distro non-free
driver   : nvidia-driver-450-server - distro non-free
driver   : nvidia-driver-470 - distro non-free
driver   : nvidia-driver-550 - third-party non-free
driver   : nvidia-driver-535-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

# remove previous drivers
sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get remove --purge '^libnvidia-.*'
sudo apt-get remove --purge '^linux-objects-nvidia-.*'
sudo apt-get remove --purge '^linux-signatures-nvidia-.*'
sudo apt-get autoremove -y
sudo apt-get clean

Any help would be appreciated!

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
9,087 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Mounika Reddy Anumandla 6,940 Reputation points Microsoft External Staff Moderator
    2025-03-24T07:10:41.0366667+00:00

    Hi Tianchen Guo,
    I have investigated on this issue further!

    Install GRID driver on Ubuntu with Secure Boot enabled

    The GRID driver installation process does not offer any options to skip kernel module build and installation and select a different source of signed kernel modules, so secure boot has to be disabled in Linux VMs in order to use them with GRID, after installing signed kernel modules.

    https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup#install-grid-driver-on-ubuntu-with-secure-boot-enabled

    Installing via Azure CLI: https://learn.microsoft.com/en-us/answers/questions/2180437/how-to-fix-no-gpu-issue-with-nc-series-on-azure-vm

    With Secure Boot enabled, all OS boot components (boot loader, kernel, kernel drivers) require trusted publishers signing. Both Windows and select Linux distributions support Secure Boot. If Secure Boot fails to authenticate that the image is signed with a trusted publisher, the VM fails to boot. For more information, see Secure Boot.
    For more information: https://learn.microsoft.com/en-us/azure/virtual-machines/trusted-launch

    Hope this helps!

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.