Edit

Share via


Install AMD GPU drivers on NVads V710-series Linux VMs

Applies to: ✔️ Linux VMs

Note

Azure currently provides installation instructions for Ubuntu 22.04 and Ubuntu 24.04. For other Linux distributions and the latest updated guide on setting up ROCm drivers, please see AMDs page - Quick start installation guide - ROCm installation(Linux), for all other ROCm versions, see ROCm release history - ROCm Documentation

NVads V710-series

To utilize the GPU capabilities of the new Azure NVads V710-series VMs running Linux, you need to install the AMD GPU drivers. The AMD GPU Driver Extension simplifies the installation process for AMD GPU drivers on NVv710-series VMs. You can manage this extension through the Azure portal, Azure PowerShell, or Azure Resource Manager(ARM) templates. For detailed information on supported operating systems and deployment steps, see AMD GPU Driver Extension documentation.

This article outlines the supported operating systems, drivers, and provides installation and verification steps for Ubuntu.

ROCm

Here are the steps for installing the AMD Linux Driver to harness the capabilities of the AMD Radeon PRO V710 GPU on an NVv5-V710 GPU Linux instance provided by Microsoft Azure. Subsequent sections provide detailed Linux driver installation instructions for users who wish to perform inference using ROCm on the NVv5-V710 GPU Linux instance.

Step1: Linux Driver Installation

  • Supported Linux Distros

Verify if the system is running supported Linux version using $ cat /etc/*release, and the output should return the string similar to: DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=XX
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu"
PRETTY_NAME="Ubuntu LTS"

  • Supported Linux Kernel

Verify if the Linux OS is running supported kernel version using $ uname -srmv, and the output should return the string similar to:
Linux 5.XX.0-XX-generic #86-Ubuntu SMP Mon Jul 10 16:07:21 UTC 2023 x86_64

Step2: Pre-configuration

Note

The disk size must be greater than 64GB to ensure optimal performance and compatibility.

  • Updated package list

    Verify if system is running current versions of packages and their dependencies, using
    $ sudo apt update

  • Python setuptools and wheel

    Verify if the system has essential Python packages for building and distributing, using
    $ sudo apt install python3-setuptools python3-wheel

  • Group permissions

    Verify if you are part of the render and video group using
    $ sudo usermod -a -G render,video $LOGNAME

  • Kernel headers and development packages

    The driver package uses Dynamic Kernel Module Support (DKMS) to build the amdgpu-dkms module for installed kernels. This process requires installing Linux kernel headers and modules for each kernel. The kernel automatically installs these packages. However, if you use multiple kernel versions or download kernel images without the meta-packages, you need to install them manually using
    $ sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"

  • Verify the GPU card

    Verify the output of the GPU card, using
    $ sudo lspci -d 1002:7461
    c3:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 7461

Note

The Virtual Function Device ID 7461 confirms that the Virtual Machine is configured with the AMD Radeon PRO V710 GPU.

  • Virtual machine update

    Run the update on NVv5-V710 GPU Linux instance running Ubuntu 22.04 OS, using sudo apt update

  • Disable amdgpu driver

    Before installing the latest AMD Linux driver, you should disable or blocklist the default AMD GPU driver found in Linux distributions like Ubuntu or RHEL. This default driver is not certified for use with the AMD Radeon PRO V710 GPU on an NVv5-V710 GPU Linux instance. Instead, use the driver optimized for Azure NVv5-V710 GPU workloads.

  • Verify for driver disable

    Verify if the amdgpu driver is already disabled, using the command: bash $ grep amdgpu /etc/modprobe.d/* -rn If the driver is blocklisted, you don't need to modify anything else. However, be cautious with entries that start with #blacklist amdgpu as it indicates that the driver isn't blocklisted.

  • Disable the amdgpu driver

    To install the latest driver, you need to blocklist the default amdgpu driver. Follow these steps:

    • Edit the /etc/modprobe.d/blacklist.conf file to include the amdgpu driver, using $ blacklist amdgpu
    • Apply the changes using $ sudo update-initramfs -uk all to ensure that the changes take effect and the driver is properly blocklisted.
  • Reboot

    After restarting the VM, the default amdgpu driver in Ubuntu Linux distributions should not load because it has been blocklisted. Confirm that the driver isn't loaded, using: $ lsmod | grep amdgpu to check if the amdgpu driver is loaded. If there is no output, it means the driver isn't loaded, and you can proceed. However, if the driver is still loaded, return to the previous step to double-check that the amdgpu driver was correctly blocklisted.

4. AMD Driver Installation

4a Installation

The following steps demonstrate the use of the amdgpu-install script for a single-version driver installation. To install the latest ROCm driver, run the following commands on your terminal:

Ubuntu 22.04
sudo apt update
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo apt install python3-setuptools python3-wheel
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
wget https://repo.radeon.com/amdgpu-install/6.3.3/ubuntu/jammy/amdgpu-install_6.3.60303-1_all.deb
sudo apt install ./amdgpu-install_6.3.60303-1_all.deb
sudo apt update
sudo apt install amdgpu-dkms rocm
Ubuntu 24.04
sudo apt update
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo apt install python3-setuptools python3-wheel
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
wget https://repo.radeon.com/amdgpu-install/6.3.3/ubuntu/noble/amdgpu-install_6.3.60303-1_all.deb
sudo apt install ./amdgpu-install_6.3.60303-1_all.deb
sudo apt update
sudo apt install amdgpu-dkms rocm

Note

Azure currently supports Ubuntu 22.04 and Ubuntu 24.04, for all other Linux distros refer to AMD's documentation.

4b Load amdgpu driver

$ sudo modprobe amdgpu

Review the output of " dmesg | grep amdgpu " to confirm that the GPU driver is loaded and initialized successfully.

$ sudo dmesg | grep amdgpu 
[ 66.177373] [drm] amdgpu kernel modesetting enabled. 
[ 66.177379] [drm] amdgpu version: 6.7.0 
[ 66.177623] amdgpu: Virtual CRAT table created for CPU 
[ 66.177653] amdgpu: Topology: Add CPU node 
[ 66.184259] amdgpu 045b:00:00.0: enabling device (0000 -> 0002) 
[ 66.670226] [drm] add ip block number 5 <amdgpu_vkms> 
[ 66.685726] amdgpu 045b:00:00.0: amdgpu: Fetched VBIOS from VRAM BAR 
[ 66.685733] amdgpu: ATOM BIOS: 113-D7190300-104 
[ 66.689542] amdgpu 045b:00:00.0: amdgpu: CP RS64 enable

4c Enable the driver

To automatically load the amdgpu driver on every reboot of the VM, we need to remove any blocklist entry that is preventing it from loading automatically.

  • Search for any file that containing blocklisted amdgpu, using $ grep amdgpu /etc/modprobe.d/* -rn. The output must render a string similar to
    /etc/modprobe.d/blacklist.conf:10:blacklist amdgpu
  • Remove the blocklist from the listed file, using $ sudo nano /etc/modprobe.d/blacklist.conf and delete the line with blacklist amdgpu.
  • Update the initramfs to apply changes on the next boot, using $ sudo update-initramfs -uk all
  • Reboot the system to load the updated configuration using $ sudo reboot. After rebooting, ensure that amdgpu driver isn't blocklisted and it's available for use.
  • Run AMD-SMI to confirm the driver is loaded successfully using $ amd-smi monitor
GPU  POWER  GPU_TEMP  MEM_TEMP  GFX_UTIL  GFX_CLOCK  MEM_UTIL  MEM_CLOCK  ENC_UTIL  ENC_CLOCK  DEC_UTIL  DEC_CLOCK     THROTTLE  SINGLE_ECC  DOUBLE_ECC  PCIE_REPLAY  VRAM_USED  VRAM_TOTAL   PCIE_BW 

  0   11 W     43 °C     58 °C      84 %   1814 MHz       1 %     96 MHz       N/A    812 MHz       N/A    512 MHz  UNTHROTTLED           0           0            0     227 MB    25476 MB  N/A Mb/s

Graphics+ROCM

1. Installation Guide

1.1 Introduction

Here are the steps for installing the AMD Linux Driver to use the power of the AMD Radeon™ PRO V710 GPU on an NVv5-V710 GPU Linux instance offered by Microsoft Azure. The Linux Driver installation also includes installing the ROCm™ Libraries, graphic libraries, and Development Tools. Subsequent sections of the document thoroughly discuss the driver installation for the graphics use case.

2. Linux Driver Prerequisites

2.1 Supported Linux Distros

The AMD Linux Driver software supports the following Linux distributions:

Linux Distribution Kernel Version Supported
Ubuntu® 22.04 6.5 ✅ Yes

Confirm the system has a supported Linux version. To obtain the Linux distribution information, use the following command:

$ uname -a && cat /etc/*release

Output is similar to the following example

Linux amd-Virtual-Machine 6.5#18~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb  7 
11:40:03 UTC 2 x86_64 x86_64 x86_64 GNU/Linux 
DISTRIB_ID=Ubuntu 
DISTRIB_RELEASE=22.04 
DISTRIB_CODENAME=jammy 
DISTRIB_DESCRIPTION="Ubuntu 22.04" 
PRETTY_NAME="Ubuntu 22.04 LTS" 

Ensure your Linux distribution and kernel version are listed in the table above.

Note

Refer to the troubleshooting section at the end of the document for instructions on how to set the 6.5 kernel as the default (at every boot time) on the NvV5 V710 GPU instance.

Note

If you plan to run the graphics workload, use the Linux distribution with graphics enabled (e.g., Ubuntu-22.04-desktop-amd64.iso).

3. Troubleshooting

This section outlines troubleshooting techniques to address issues that may arise during the driver installation process. If you're using the Kernel 6.8, follow the below steps to downgrade to kernel 6.5.

Check Loaded Kernels:

Run the following command to list the loaded kernels

dpkg --list | egrep -i --color 'linux-image|linux-headers|linux-modules' | awk '{ print $2 }'

Review the output to see the currently loaded kernels.

Install Kernel 6.5:

If Kernel 6.5 isn't loaded, install it using

sudo apt install linux-image-6.5.0-1025-azure
Purge Kernels Above 6.5:

Use the following command to purge kernels above version 6.5

sudo apt purge linux-headers-6.8.0-1025-azure linux-image-6.8.0-1025-azure linux-modules-6.8.0-1025-azure
Verify Kernel Version:

Verify that only Kernel 6.5 is present by running

dpkg --list | egrep -i --color 'linux-image|linux-headers|linux-modules' | awk '{ print $2 }'

The output should be similar to the following example:

linux-image-6.5.0-1025-azure
linux-headers-6.5.0-1025-azure
linux-modules-6.5.0-1025-azure
Loading Kernel 6.5 by default on boot:

When the NVv5-V710 GPU Linux instance is launched, the OS boots to the 6.8.0-1015-azure kernel instead of the 6.5.0-1025-azure kernel. The GRUB settings need to be modified to boot into the 6.5.0-1025-azure kernel. To check the currently installed kernels, use the following command

$ dpkg --list | egrep -i --color 'linux-image' | awk '{ print $2 }'

Output is similar to the following example

Linux-image-6.5.0-1025-azure 
linux-image-6.8.0-1015-azure 
linux-image-azure

Open the GRUB settings and change GRUB_DEFAULT="0" to GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 6.5.0-1025-azure"

GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 6.5.0-1025-azure"
Update GRUB and Reboot:

Update GRUB and reboot the system using

sudo update-grub sudo reboot
Validate Kernel Version:

After rebooting, validate the kernel version using

uname -a

4. Prerequisites

Note

The disk size must be greater than 64GB to ensure optimal performance and compatibility.

4.1 Update the package list

To ensure you have the latest information on the newest versions of packages and their dependencies.

sudo apt update

4.2 Install Python Setuptools and wheel

These packages are essential for building and distributing Python packages.

$ sudo apt install python3-setuptools python3-wheel

4.3 Setting Permissions for groups

Add yourself to the render and video group using the following command:

$ sudo usermod -a -G render,video $LOGNAME

4.4 Kernel headers and development packages

The driver package uses Dynamic Kernel Module Support (DKMS) to build the amdgpu-dkms module for installed kernels. This requires the installation of Linux kernel headers and modules for each kernel. These packages are installed automatically with the kernel. However, if you use multiple kernel versions or download kernel images without the meta-packages, you might need to install them manually.

$ sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"

4.5 Verifying GPU Card in Linux®

The output should the GPU card.

$ sudo lspci -d 1002:7461
c3:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 7461

Note

7461 is the Virtual Function Device ID. This confirmation indicates that the Virtual Machine is configured with the AMD Radeon™ PRO V710 GPU.

4.6 Virtual Machine Update

On an NVv5-V710 GPU Linux instance running Ubuntu 22.04 OS, run the update:

$ sudo apt update

4.7 Blacklist amdgpu Driver

Before installing the latest AMD Linux driver, it's important to blacklist the default amdgpu driver. The default driver, present in Linux distributions like Ubuntu or RHEL, isn't certified for use with the AMD Radeon™ PRO V710 GPU on an NVv5-V710 GPU Linux instance. The driver optimized for Azure NVv5-V710 GPU workloads should be used instead.

Check if the Driver is Already Blacklisted

To check if the amdgpu driver is already blacklisted, run the following command:

grep amdgpu /etc/modprobe.d/* -rn

If the driver is blacklisted, you don't need to modify anything else. Be careful with entries that start with #blacklist amdgpu – this indication means that the driver isn't blacklisted

Disable the amdgpu Driver

If the amdgpu driver is not already blocklisted follow the steps to blacklist it.

Open the /etc/modprobe.d/blacklist.conf file to edit:

sudo vim /etc/modprobe.d/blacklist.conf

Add the following line to blacklist the amdgpu driver:

blacklist amdgpu

After updating the blacklist.conf file, run the following command to apply the changes:

$ sudo update-initramfs -uk all

This command ensures the changes take effect and the driver is properly blacklisted.

4.8 Reboot

After you restart the virtual machine, the default amdgpu driver in Ubuntu Linux distributions shouldn't load because it was previously blacklisted. To confirm that the driver isn't loaded, use the following command:

lsmod | grep amdgpu

5. AMD Driver Installation

5.1 Installation

The following steps demonstrate the use of the amdgpu-install script for a single-version driver installation. These instruction install ROCm version 6.1.4 on Ubuntu 22.04 (Jammy).

# Upgrade the system
sudo apt upgrade

# Download amdgpu installer
wget -N -P /tmp/ https://repo.radeon.com/amdgpu-install/6.1.4/ubuntu/jammy/amdgpu-install_6.1.60104-1_all.deb

# If an AMDGPU driver was previously installed, uninstall it
sudo amdgpu-uninstall
sudo apt remove amdgpu-install --purge

# Install the installer package
sudo apt-get install /tmp/amdgpu-install_6.1.60104-1_all.deb

# Install the driver
sudo amdgpu-install --usecase=workstation,rocm,amf --opencl=rocr --vulkan=pro --no-32 --accept-eula

5.2 Load amdgpu driver

After installation, load the amdgpu Driver

$ sudo modprobe amdgpu

You can verify the driver is loaded and initialized successfully with

sudo dmesg | grep amdgpu

Example output:

[ 66.177373] [drm] amdgpu kernel modesetting enabled. 
[ 66.177379] [drm] amdgpu version: 6.7.0 
[ 66.177623] amdgpu: Virtual CRAT table created for CPU 
[ 66.177653] amdgpu: Topology: Add CPU node 
[ 66.184259] amdgpu 045b:00:00.0: enabling device (0000 -> 0002) 
[ 66.670226] [drm] add ip block number 5 <amdgpu_vkms> 
[ 66.685726] amdgpu 045b:00:00.0: amdgpu: Fetched VBIOS from VRAM BAR 
[ 66.685733] amdgpu: ATOM BIOS: 113-D7190300-104 
[ 66.689542] amdgpu 045b:00:00.0: amdgpu: CP RS64 enable

5.2.1 Enable the driver

To automatically load the amdgpu driver on every reboot of the VM, we need to remove any blacklist entry that is preventing it from loading automatically.

Search for the blacklist entry

Run the following command to find any file that contains blacklist amdgpu:

grep amdgpu /etc/modprobe.d/* -rn

If the driver is blacklisted, you see output similar to:

/etc/modprobe.d/blacklist.conf:10:blacklist amdgpu
Remove the blacklist line

Open the file listed in the output:

sudo nano /etc/modprobe.d/blacklist.conf

Delete the line that says:

blacklist amdgpu

Save and exit the file

Update initramfs

Update the initramfs so the changes are applied on the next boot:

sudo update-initramfs -uk all
Reboot the system

Reboot the machine to load the updated configuration:

sudo reboot

After rebooting, the amdgpu driver should no longer be blacklisted and will be available for use.

Run AMD-SMI to confirm the driver is loaded successfully

$ amd-smi monitor
GPU  POWER  GPU_TEMP  MEM_TEMP  GFX_UTIL  GFX_CLOCK  MEM_UTIL  MEM_CLOCK  ENC_UTIL  ENC_CLOCK  DEC_UTIL  DEC_CLOCK     THROTTLE  SINGLE_ECC  DOUBLE_ECC  PCIE_REPLAY  VRAM_USED  VRAM_TOTAL   PCIE_BW 

  0   11 W     43 °C     58 °C      84 %   1814 MHz       1 %     96 MHz       N/A    812 MHz       N/A    512 MHz  UNTHROTTLED           0           0            0     227 MB    25476 MB  N/A Mb/s

6. x11 Remote Server Configuration

After installing the AMD Graphics Linux drivers with, the default graphical interface (Xserver) doesn't utilize hardware acceleration As a solution, a virtual display should be created with hardware acceleration enabled that can be used for remote access (x11vnc). The following steps walk through the virtual display setup:

6.1 Install Required Packages

Install x11vnc and net-tools

$ sudo apt install net-tools 
$ sudo apt install x11vnc 

6.2 Update GDM3 Custom Configuration

Edit the GDM3 configuration file to:

-Disable Wayland (which doesn't support x11vnc)

-Enable automatic login (so a graphical session is available at boot)

Open the configuration file with:

$ sudo vim /etc/gdm3/custom.conf

After modification the file looks like this

# GDM configuration storage 
 
[daemon] 
AutomaticLoginEnable=true 
AutomaticLogin=amd 
 
# Uncomment the line below to force the login screen to use Xorg 
WaylandEnable=false 
 
# Enabling automatic login 
 
# Enabling timed login 
#  TimedLoginEnable = true 
#  TimedLogin = user1 
#  TimedLoginDelay = 10 
 
[security] 
 
[xdmcp] 
 
[chooser] 
 
[debug] 
# Uncomment the line below to turn on debugging 
# More verbose logs 
# Additionally lets the X server dump core if it crashes 
#Enable=true

6.3 Reboot and Restart gdm3

After reboot, restart the gdm3 by following command

$ sudo systemctl restart gdm3

6.4 Modify X Configuration

6.4.1 Getting Bus ID

The BusID of the AMD Radeon™ PRO V710 GPU must be manually added to the X11 configuration file. To get the BusID, follow the steps

$ lspci -d 1002: | awk '{print $1}' 
3a9e:00:00.0

Note

Convert BusID of GPU from HEX to Decimal, e.g., "3a9e:00:00.0", convert HEX "3a9e00" into DEC "3841536"

6.4.2 Updating X Configuration to add Device and Screen

Furthermore, modify the “Screen” section to incorporate this device.

To ensure the driver configuration is correct, modify /usr/share/X11/xorg.conf.d/00-amdgpu.conf to match the content.

Note

Make sure to update BusID as per your system configuration (as shown in the previous step)

Section "OutputClass" 
        Identifier "AMDgpu" 
        MatchDriver "amdgpu" 
        Driver "amdgpu" 
EndSection 
 
Section "Files" 
        ModulePath "/opt/amdgpu-pro/lib/xorg/modules" 
        ModulePath "/opt/amdgpu/lib/xorg/modules" 
        ModulePath "/usr/lib/xorg/modules" 
EndSection 
 
Section "Device" 
    Identifier  "Card0" 
    Driver      "amdgpu" 
    BusID  "PCI:3841536:0:0" 
EndSection 
 
Section "Screen" 
    Identifier "Screen0" 
    Device     "Card0" 
    Monitor    "Monitor0" 
    SubSection "Display" 
        Viewport   0 0 
        Depth     1 
    EndSubSection 
    SubSection "Display" 
        Viewport   0 0 
        Depth     4 
    EndSubSection 
    SubSection "Display" 
        Viewport   0 0 
        Depth     8 
    EndSubSection 
    SubSection "Display" 
        Viewport   0 0 
        Depth     15 
    EndSubSection 
    SubSection "Display" 
        Viewport   0 0 
        Depth     16 
    EndSubSection 
    SubSection "Display" 
        Viewport   0 0 
        Depth     24 
EndSubSection 
EndSection

Also modify /usr/share/X11/xorg.conf.d/10-amdgpu.conf to match the following section

Section "OutputClass" 
Identifier "Card0" 
MatchDriver "amdgpu" 
Driver "amdgpu" 
Option "PrimaryGPU" "yes" 
EndSection 

6.5 Reboot

After installation, reboot the virtual machine to apply changes:

sudo reboot

6.6 Load Driver

Once the system is backup, load the amdgpu driver using the following commands:

$ sudo systemctl stop gdm   
$ sudo modprobe amdgpu  
$ sudo systemctl start gdm 

These commands temporarily stop and restart the GNOME Display Manager(gdm) to load the driver correctly. Make sure you save your work before running them

6.7 Running x11vnc

To start the VNC server and automatically find the correct display and authentication, use the following command:

x11vnc --forever -find

This command searches for the active X display and user credentials (XAUTH) automatically.

Note

This setup is only compatible with the supported Ubuntu Desktop image. These instructions do not work for Ubuntu Server images.

Uninstallation Steps

If you need to uninstall the existing amdgpu driver, follow these steps:

Check DKMS status:

dkms status

Uninstall the amdgpu driver:

sudo amdgpu-install --uninstall
sudo amdgpu-uninstall

Remove the amdgpu installation package:

sudo apt autoremove --purge amdgpu-install

Reboot the system:

sudo reboot

Check DKMS status again to ensure the driver is uninstalled:

dkms status

This command ensures the old amdgpu driver is fully removed from the system before installing the new driver.