Using ubuntu-hpc images with Standard_ND40rs_v2 VMs

Niyongabo, Patrick 26 Reputation points
2022-08-17T22:53:34.797+00:00

This Ndv2 page1 mentions that the 4.7-1.0.0.1 Mellanox OFED driver is what should be used when enabling infiniband on Standard_ND40rs_v2 VMs.

However the ubuntu-hpc 18.04 images come pre-installed with v4.9-3.1.5.0 [2], and 20.04 images pre-installed come with v5.6-1.0.3.3 [3].

Is there a way to safely downgrade the Mellanox OFED driver without corrupting (or causing mismatches with) existing drivers?
I have successfully tried uninstalling the pre-installed version. But when I try to install v4.7-1.0.0.1, i get the following error message (i have also attached the log file232050-mlnx.log):

sudo ./mlnxofedinstall --add-kernel-support --without-fw-update -vvv
Distro was not provided, trying to auto-detect the current distro...
Auto-detected ubuntu18.04 distro.
Note: This program will create MLNX_OFED_LINUX TGZ for ubuntu18.04 under /tmp/MLNX_OFED_LINUX-4.7-1.0.0.1-5.4.0-1043-azure directory.
See log file /tmp/MLNX_OFED_LINUX-4.7-1.0.0.1-5.4.0-1043-azure/mlnx_iso.27902_logs/mlnx_ofed_iso.27902.log

Checking if all needed packages are installed...
Building MLNX_OFED_LINUX DEBS . Please wait...

ERROR: Failed executing "MLNX_OFED_SRC-4.7-1.0.0.1/install.pl --tmpdir /tmp/MLNX_OFED_LINUX-4.7-1.0.0.1-5.4.0-1043-azure/mlnx_iso.27902_logs --kernel-only --kernel 5.4.0-1043-azure --kernel-sources /lib/modules/5.4.0-1043-azure/build --builddir /tmp/MLNX_OFED_LINUX-4.7-1.0.0.1-5.4.0-1043-azure/mlnx_iso.27902 --without-dkms --without-debug-symbols --build-only --distro ubuntu18.04"
ERROR: See /tmp/MLNX_OFED_LINUX-4.7-1.0.0.1-5.4.0-1043-azure/mlnx_iso.27902_logs/mlnx_ofed_iso.27902.log

1 https://learn.microsoft.com/en-us/azure/virtual-machines/ndv2-series
[2] https://github.com/Azure/azhpc-images/blob/master/ubuntu/ubuntu-18.x/ubuntu-18.04-LTS-hpc/install_mellanoxofed.sh
[3] https://github.com/Azure/azhpc-images/blob/master/ubuntu/ubuntu-20.x/ubuntu-20.04-hpc/install_mellanoxofed.sh

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
7,125 questions
Azure Data Science Virtual Machines
Azure Data Science Virtual Machines
Azure Virtual Machine images that are pre-installed, configured, and tested with several commonly used tools for data analytics, machine learning, and artificial intelligence training.
67 questions
0 comments No comments
{count} votes

Accepted answer
  1. vipullag-MSFT 24,106 Reputation points Microsoft Employee
    2022-08-18T14:19:17.083+00:00

    @Niyongabo, Patrick

    Welcome to Microsoft Q&A Platform, thanks for posting your query here.

    From the details and error shared, the mlnx.log shows a build error (Failed to build mlnx-ofed-kernel DEB).

    The NDv2 series supports latest OFED version. Please use “5.6-1.0.3.3” version of MOFED.
    There are two Ubuntu HPC 18.04 images

    1. 18.04 image that supports MOFED LTS version 4.9-3.1.5.0 suitable for CX3-pro cards
    2. 18.04 image that is suitable for NDv2/ NDv4/ NCv4 and it has MOFED 5.6-1.0.3.3 installed.

    Please refer to this link for installation steps. Additionally, you can use Ubuntu HPC 18.04 image directly (microsoft-dsvm:ubuntu-hpc:1804:latest) on an NDv2 SKU.

    Hope this helps.
    If you need further help on this, tag me in a comment.
    If the suggested response helped you resolve your issue, please 'Accept as answer', so that it can help others in the community looking for help on similar topics.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Niyongabo, Patrick 26 Reputation points
    2022-08-19T19:48:38.897+00:00

    Please disregard the above follow-up Q. I was able to figure out the issue in my setup.

    My IB was not working because I forgot to create my instances in the same availability_set & proximity placement_group.

    This link helped - https://learn.microsoft.com/en-us/answers/questions/513090/azure-connectivity-problems-with-infinibandrdma-hb.html

    1 person found this answer helpful.
    0 comments No comments