question

SimranjeetSingh-4690 avatar image
0 Votes"
SimranjeetSingh-4690 asked SimranjeetSingh-4690 commented

Nvidia Graphics card disconnects from Ubuntu OS after restart

Hi,

I deployed a VM image from Marketplace with Pytorch and Cuda preinstalled on an Ubuntu OS. At first setup, everything works fine and I'm able to detect NVIDIA GPU from within torch package and my code runs fine.
However, when I restart my VM, the connection between OS and GPU seems to be broken and my VM can no longer detect the graphics card. Only solution so far is to redeploy my VM and start from scratch.
Has anyone faced similar issue or if anyone can help me solving this weird problem?

Thanks

azure-virtual-machinesazure-virtual-machines-images
· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello, @SimranjeetSingh-4690!

What marketplace image are you using and what VM SKU are you using?

0 Votes 0 ·

Thanks @kobulloc-MSFT for getting back on this.

I was using AISE PyTorch GPU Notebook (Marketplace Link) image

and using VM SKU Standard NC6s_v2.

I know that the image has a bit old version of pytorch by default but I updated the versions and it seemed to work for me.

On first setup, everything works fine but as soon as I shutdown the VM and start it next day, I get the errors as attached. I googled around and this is my understanding that OS doesn't seem to communicate with GPU. For a few, restarting system worked but not for others. Even updating NVIDIA drivers didn't work.

140245-image.png


0 Votes 0 ·
image.png (64.8 KiB)

Hello again, @SimranjeetSingh-4690!

Unfortunately I'm unable to use some third party images like the Jetware image you linked but it sounds like there is a configuration change that happens on reboot (updates?). Are you able to get any information on what might have happened after the reboot from the VM's logs?

0 Votes 0 ·
Show more comments

Attaching output of lspci command. NVIDIA part returns access denied.

140224-image.png


0 Votes 0 ·
image.png (134.5 KiB)

1 Answer

kobulloc-MSFT avatar image
0 Votes"
kobulloc-MSFT answered kobulloc-MSFT edited

Hello, @SimranjeetSingh-4690!

I haven't been able to run into the issues you are describing. This is the setup I used:

  • Image: NVIDIA GPU-Optimized PyTorch Image - v21.06.0 - Gen2 (Azure Marketplace link)

  • VM SKU: Standard_NV12s.v3

After restarting the VM, I'm still able to see the GPU:

140021-image.png

I would try using the same image and see if you still encounter this issue.


image.png (182.7 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.