Nvidia driver too old error when loading bart model onto CUDA, works on other models

matsuo_basho 30 Reputation points
2023-10-12T15:08:52.7033333+00:00

I'm getting an error loading a HuggingFace model on an AzureML GPU compute (using AzureML notebooks). Loading other models works, such as the first one in the example below (code input is really buggy, gave up trying to format it in codeblock properly after 10 minutes. And this is the company revolutionizing our world with AI, lol):

from transformers import AutoModelForCausalLM
device = "cuda"
checkpoint1 = "Salesforce/codegen-350M-mono"

this works!!

codegen = AutoModelForCausalLM.from_pretrained(checkpoint1, trust_remote_code=True).to(device)
checkpoint2 = "facebook/bart-large"

this doesn't

bart = AutoModelForCausalLM.from_pretrained(checkpoint2, trust_remote_code=True).to(device)

Error

RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

I understand driver + Pytorch aren't compatible, but why does it work for the codegen model? Is there something about this particular Bart model? Seems like I shouldn't have to re-install CUDA drivers to get this to work.

Relevant libraries:

transformers 4.34.0
torch 2.1.0
nvidia-cublas-cu12      12.1.3.1
nvidia-cuda-cupti-cu12      12.1.105
nvidia-cuda-nvrtc-cu12      12.1.105
nvidia-cuda-runtime-cu12    12.1.105
nvidia-cudnn-cu12       8.9.2.26
nvidia-cufft-cu12       11.0.2.54
nvidia-curand-cu12      10.3.2.106
nvidia-cusolver-cu12        11.4.5.107
nvidia-cusparse-cu12        12.1.0.106
nvidia-nccl-cu12        2.18.1
nvidia-nvjitlink-cu12       12.2.140
nvidia-nvtx-cu12        12.1.105
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,333 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,826 Reputation points
    2023-10-26T04:08:50.96+00:00

    Thanks for the details. Installing pytorch through transformers extras probably not the best way to get compatible torch to your environment. Based on the cuda drivers in your base image you can try to install torch recommended way https://pytorch.org/. It should be fine with transformers that only pin a lower bound. Alternatively you can try more recent cuda image from nvcr.io if you have an option to specify it.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.