Facing Compatibility Issues in installing Nvidia Drivers in Azure NC A100 v4 VMs, Ubuntu 22.04 Linux OS 6.5.0-1016-azure x86_64 GNU/Linux

Nishant Wadhwani 0 Reputation points
2024-03-14T11:06:00.64+00:00

How can we install Nvidia Drivers, CUDA packages, CUDNN packages on Azure NC A100 v4 VM Ubuntu 22.04 Linux OS with GPU capabilities? I have tried installing different versions on my VM including nvidia-driver-535, nvidia-driver-550, nvidia-driver-535-server, etc. But each time am facing an issue:

nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

I have read all the blogs pertaining to reinstallation of kernels and disabling secure boot. I have already taken care of all these steps. Looking forward to get some support and guidance from Microsoft Azure team.

Also tried following this link:
https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup
Then also am facing issues.

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
7,586 questions
{count} votes

1 answer

Sort by: Most helpful
  1. deherman-MSFT 35,636 Reputation points Microsoft Employee
    2024-03-15T15:47:12.5533333+00:00

    @Nishant Wadhwani

    That is true for most N series VMs. However the NC A100 page mentions this: "Due to increased GPU memory I/O footprint, the NC A100 v4 requires the use of Generation 2 VMs and marketplace images. While the Azure HPC images are strongly recommended, Azure HPC Ubuntu 20.04 and Azure HPC CentOS 7.9, RHEL 8.8, RHEL 9.2, Windows Server 2019, and Windows Server 2022 images are supported."
    You could likely get this working with a standard Ubuntu image. However you would need to configure the additional software packages which are preinstalled in the HPC image.
    I was able to run the standard install and get the nvidia-smi command working on a standard Ubuntu 22.04 image.

    sudo apt update && sudo apt install -y ubuntu-drivers-common
    sudo ubuntu-drivers install
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
    sudo apt install -y ./cuda-keyring_1.1-1_all.deb
    sudo apt update
    sudo apt -y install cuda-toolkit-12-3
    nvidia-smi
    

    If you still have questions, please let us know in the "comments" and we would be happy to help you. Comment is the fastest way of notifying the experts.

    If the answer has been helpful, we appreciate hearing from you and would love to help others who may have the same question. Accepting answers helps increase visibility of this question for other members of the Microsoft Q&A community.

    Thank you for helping to improve Microsoft Q&A! User's image

    0 comments No comments