Install CUDA toolkit and drivers in VM

Luis Vivas 20 Reputation points
2023-09-29T09:22:27.1533333+00:00

I have been trying to install CUDA toolkit in an N serie VM. I am able to install the drivers but when trying to do NVCC --version, it does not work.

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
9,013 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Luis Vivas 20 Reputation points
    2023-09-29T09:51:58.1533333+00:00

    Here I answer myself, after having a having Azure MIcrosoft Support who helped me to set up the whole machine.

    I want to mention that I need this machine to create and test AI models, in NLP and other deep learning stuff (Pytorch mainly). I have been trying this for weeks until I finally made it work.

    • Choose a VM in the N series. In Azure for now, N series are those that have GPU, remember that CUDA is a NVIDIA technology so make sure that the one you choose is NVIDIA and not AMD. I used Standard NC8as T4 v3. The C in the code, means computational if you choose V is intended for visual stuff. If you don´t find it in the list, it could mean that you do not have quota. Azure and the other guys are very restrictive with GPU, I assume due to the chips shortage.
    • Select a OS, I used Ubutnu 22.04, important to choose Security type as Standard. Azure recently set Trusted as default and it could cause problems when you add extensions such as the NVIDIA one. Or the drive you are going to install.
    • Once it is running, you need to install GCC, so you need: sudo apt-get install gcc
    • Then install make: sudo apt-get install make
    • Then install go to the CUDA toolkit website https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=runfile_local
    • I recommend that you choose local file. In other forums I saw they recommend network but due to an update, network was not working well. Please notice that this is CUDA 12, by the time you read this, there might be a different version, it should work as well (hopefully).
    • Run the commands as shown, it his case
    • wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_535.104.05_linux.run
    • sudo sh cuda_12.2.2_535.104.05_linux.run
    • It should ask to accept terms and then ask to install. You do not need to change anything with this you are installing Driver 535.104.05 and toolkit 12.2.2
    • Now you should be able to see the driver. Rember that Drivers and tool kit are different things but the toolkit you are installing is going to install drivers.
    • If you type nvidia-smi, you should be able to see the driver and toolkit if you run anything using GPU the taks will be shown and the mermory usage will appear.
    • Now if you type nvcc --version, you should see the version. I had a problem and something to modify.
    • CUDA toolkit was installed but could not be found. If you go to the folder /usr/local/ you are going to find the folder cuda-12.2 so we need to edit the bash file in etc: sudo nano /etc/bash.bashrc and in the last line, add: export PATH=$PATH:/usr/local/cuda/bin
    • Save the file. reboot and the SSH page and enter again
    • if you do echo $PATH, you get this /usr/local/cuda/bin, what we did was to tell ubuntu to search in that folder.
    • Now if we type which nvcc we should get /usr/local/cuda/bin/nvcc
    • now type nvcc --version should give back something like this Copyright (c) 2005-2023 NVIDIA Corporation

    Built on Tue_Aug_15_22:02:13_PDT_2023

    Cuda compilation tools, release 12.2, V12.2.140

    Build cuda_12.2.r12.2/compiler.33191640_0

    I hope this save you time by working with Azure and GPU, it took me 2 weeks to solve it. I hope the new versions do not affect this solution and the in the future and the make an image that has everything at once so we Data Scientist do not have to deal with this.

    4 people found this answer helpful.

  2. kobulloc-MSFT 26,801 Reputation points Microsoft Employee Moderator
    2023-10-09T17:51:46.1633333+00:00

    Hello, @Luis Vivas !

    Thank you very much for following up with the process to install the CUDA toolkit on an N series VM. As you mentioned, this is a popular subject for data scientists and I know others will find this write up quite valuable.

    I've upvoted your post but since there is currently a limitation in Microsoft Q&A that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to "Accept " the answer for additional visibility.

    How do I install the CUDA toolkit on an N series VM?

    Answer provided by Luix Vivas:

    Here I answer myself, after having a having Azure MIcrosoft Support who helped me to set up the whole machine. I want to mention that I need this machine to create and test AI models, in NLP and other deep learning stuff (Pytorch mainly). I have been trying this for weeks until I finally made it work.

    • Choose a VM in the N series. In Azure for now, N series are those that have GPU, remember that CUDA is a NVIDIA technology so make sure that the one you choose is NVIDIA and not AMD. I used Standard NC8as T4 v3. The C in the code, means computational if you choose V is intended for visual stuff. If you don´t find it in the list, it could mean that you do not have quota. Azure and the other guys are very restrictive with GPU, I assume due to the chips shortage.
    • Select a OS, I used Ubutnu 22.04, important to choose Security type as Standard. Azure recently set Trusted as default and it could cause problems when you add extensions such as the NVIDIA one. Or the drive you are going to install.
    • Once it is running, you need to install GCC, so you need: sudo apt-get install gcc
    • Then install make: sudo apt-get install make
    • Then install go to the CUDA toolkit website https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=runfile_local
    • I recommend that you choose local file. In other forums I saw they recommend network but due to an update, network was not working well. Please notice that this is CUDA 12, by the time you read this, there might be a different version, it should work as well (hopefully).
    • Run the commands as shown, it his case
    • wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_535.104.05_linux.run
    • sudo sh cuda_12.2.2_535.104.05_linux.run
    • It should ask to accept terms and then ask to install. You do not need to change anything with this you are installing Driver 535.104.05 and toolkit 12.2.2
    • Now you should be able to see the driver. Rember that Drivers and tool kit are different things but the toolkit you are installing is going to install drivers.
    • If you type nvidia-smi, you should be able to see the driver and toolkit if you run anything using GPU the taks will be shown and the mermory usage will appear.
    • Now if you type nvcc --version, you should see the version. I had a problem and something to modify.
    • CUDA toolkit was installed but could not be found. If you go to the folder /usr/local/ you are going to find the folder cuda-12.2 so we need to edit the bash file in etc: sudo nano /etc/bash.bashrc and in the last line, add: export PATH=$PATH:/usr/local/cuda/bin
    • Save the file. reboot and the SSH page and enter again
    • if you do echo $PATH, you get this /usr/local/cuda/bin, what we did was to tell ubuntu to search in that folder.
    • Now if we type which nvcc we should get /usr/local/cuda/bin/nvcc
    • now type nvcc --version should give back something like this Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Aug_15_22:02:13_PDT_2023 Cuda compilation tools, release 12.2, V12.2.140 Build cuda_12.2.r12.2/compiler.33191640_0 I hope this save you time by working with Azure and GPU, it took me 2 weeks to solve it. I hope the new versions do not affect this solution and the in the future and the make an image that has everything at once so we Data Scientist do not have to deal with this.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.