Thanks for the details. Installing pytorch through transformers extras probably not the best way to get compatible torch to your environment. Based on the cuda drivers in your base image you can try to install torch recommended way https://pytorch.org/. It should be fine with transformers that only pin a lower bound. Alternatively you can try more recent cuda image from nvcr.io if you have an option to specify it.
Nvidia driver too old error when loading bart model onto CUDA, works on other models
I'm getting an error loading a HuggingFace model on an AzureML GPU compute (using AzureML notebooks). Loading other models works, such as the first one in the example below (code input is really buggy, gave up trying to format it in codeblock properly after 10 minutes. And this is the company revolutionizing our world with AI, lol):
from transformers import AutoModelForCausalLM
device = "cuda"
checkpoint1 = "Salesforce/codegen-350M-mono"
this works!!
codegen = AutoModelForCausalLM.from_pretrained(checkpoint1, trust_remote_code=True).to(device)
checkpoint2 = "facebook/bart-large"
this doesn't
bart = AutoModelForCausalLM.from_pretrained(checkpoint2, trust_remote_code=True).to(device)
Error
RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.
I understand driver + Pytorch aren't compatible, but why does it work for the codegen model? Is there something about this particular Bart model? Seems like I shouldn't have to re-install CUDA drivers to get this to work.
Relevant libraries:
transformers 4.34.0
torch 2.1.0
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.2.140
nvidia-nvtx-cu12 12.1.105