Hi AG,
Based on above query, your VM’s current virtualization mode is vGPU not a dedicated passthrough GPU. In this mode, the GPU is likely shared among multiple VMs which restricts CUDA functionality, so you cannot fully use CUDA-dependent tools like vLLM. Switching to GPU passthrough (dedicated GPU) would resolve this by providing full CUDA support
Try to choose NC or ND series VMs with V100 or A100 GPUs on Azure for full, dedicated GPU passthrough (not vGPU), ensuring maximum performance and compatibility for CUDA and other GPU-accelerated Compute-intensive and Graphics-intensive workloads. https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/overview?tabs=breakdownseries%2Cgeneralsizelist%2Ccomputesizelist%2Cmemorysizelist%2Cstoragesizelist%2Cgpusizelist%2Cfpgasizelist%2Chpcsizelist#gpu-accelerated
check the virtualization mode of an NVIDIA GPU;
nvidia-smi -q | grep "Virtualization Mode"
and the desired output should show Virtualization Mode: Passthrough
Please check the workaround followed here for more information:
https://learn.microsoft.com/en-us/answers/questions/1377984/install-cuda-toolkit-and-drivers-in-vm
If the information is helpful, please click on "Accept Answer" and "Upvote"
If you have any queries, please do let us know, we will help you.