Linux N-Series VM - NVIDIA failing at boot

Nikhil Chand 1 Reputation point
2022-08-25T13:12:55.457+00:00

Size: Standard NV36ads A10 v5 (36 vcpus, 440 GiB memory)
OS: Ubuntu 22_04-lts-gen2 (edited, original post stating Ubuntu 20.04 was incorrect. Correct version is 22.04).

Added the NVIDIA GPU Driver Extension for Linux extension per: https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/hpccompute-gpu-linux

  • Extension says it installed successfully.

Upon boot I see:

[FAILED] Failed to start LSB: Micro…ension for nVidia GPU Drivers.
See 'systemctl status nvidia-vmext-service.service' for details.

Any ideas what I can try?

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
7,196 questions
0 comments No comments
{count} votes

5 answers

Sort by: Most helpful
  1. JimmySalian-2011 41,926 Reputation points
    2022-08-25T13:22:00.317+00:00

    Hi,

    Can you check the logs and provide more details on this, also check the troublshooting steps for the extensions over here features-linux

    ==
    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

    1 person found this answer helpful.
    0 comments No comments

  2. Nikhil Chand 1 Reputation point
    2022-08-25T13:35:59.023+00:00

    Here is some perhaps related info in the nvidia log file?

    234951-image.png

    0 comments No comments

  3. Nikhil Chand 1 Reputation point
    2022-08-25T21:52:44.487+00:00

    My apologies - the OS is 22.04, not 20.04:

    22_04-lts-gen2

    0 comments No comments

  4. Nikhil Chand 1 Reputation point
    2022-08-26T14:20:14.1+00:00

    One thing I just discovered, not sure if it is related.

    In this document, it seems to indicate that the typeHandlerVersion should be 1.3. https://learn.microsoft.com/en-us/azure/databox-online/azure-stack-edge-gpu-deploy-virtual-machine-install-gpu-extension?tabs=linux

    But, when installing the extension via Azure portal, the template file shows version 1.2. Is it possible Azure is not publishing the correct/updated version of the NvidiaGpuDriverLinux driver extension package?

    0 comments No comments

  5. Nikhil Chand 1 Reputation point
    2022-08-26T15:43:38.43+00:00

    Another new development. I followed the article here, which was provided by Microsoft in the Diagnostics are of Azure portal as the method to install NVIDIA graphics drivers. Now, the machine has a boot loop failure as such:

    235228-image.png

    After about 30 minutes, it finally came to a login prompt. I uninstalled via the following and it boots again:

    235288-image.png

    0 comments No comments