Unable to access GPU from Azure ML Component on an Azure ML Compute
I have had a draft pipeline on my Azure Machine Learning Studio for quite a while, which contains 4 components. This pipeline was linked to an azure ML compute that had a GPU, I believe it was a Tesla K80 but not entirely sure. In September the virtual machine family that the compute belonged to was deprecated. I provisioned a new compute instance, the Standard_NC4as_T4_v3. This has a tesla T4.
The issue that I have is that the model training component cannot detect the GPU on the machine. The environment for this component has not changed from when it was run on the previous compute. Whoich was able to detect the GPU and run as expected. I have also verified that the GPU Nvidia drivers are installed on the machine through running:
torch.cuda.is_available()
on a Jupyter notebook on the machine. I am using pytorch to train the model, from what I can research the versions that I am using are compatible with and the drivers and cuda toolkit on the machine.
The pytorch packages are being installed through conda and below are the relevant package versions:
- pytorch::pytorch==2.0.1
- pytorch::torchaudio==2.0.2
- pytorch::torchvision==0.15.2
- pytorch::torchtext==0.15.2
- pytorch::pytorch-cuda==11.8
The below is the details of the GPU on the machine gotten by running nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.199.02 Driver Version: 470.199.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000001:00:00.0 Off | 0 |
| N/A 34C P0 27W / 70W | 0MiB / 15109MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Any idea on what the issue may be would be much appreciated.
Thank you