Issue with T4 GPU Recognition for Custom Fine-tuning on Azure

GenApe 0 Reputation points
2024-01-04T02:36:49.4466667+00:00

I am currently using Azure for custom fine-tuning using ML on a T4 GPU-enabled VM. After setting up the VM and installing CUDA in the backend, I encountered an issue where the system couldn't detect the T4 GPU. Upon consulting the documentation, I realized that for custom training, it is recommended to submit the fine-tuning task in the form of a Python script using Azure's Python SDK.

However, I couldn't find information in the Azure documentation regarding the installation of required packages (specified in the requirements.txt file) for custom training scripts. Could someone guide me on where to specify or attach the requirements.txt file in the Azure documentation for custom training?

Additionally, if there's a simplified way to ensure that my VM recognizes and utilizes the T4 GPU effectively without any hitches, I would greatly appreciate any insights or guidance on that matter.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,325 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Ramr-msft 17,826 Reputation points
    2024-01-05T11:25:09.9033333+00:00

    Thanks for the question, For your first question, you can specify the required packages for your custom training scripts in Azure by creating a requirements.txt file alongside your Python script. You then need to set the AZUREML_EXTRA_REQUIREMENTS_TXT environment variable in your Azure Machine Learning environment to the location of the requirements.txt file You can also define your required packages in a requirements.txt file within a folder structure like this: image_build -> requirements.txt

    As for your second question, to ensure that your VM recognizes and utilizes the T4 GPU effectively, you need to install the appropriate NVIDIA GPU drivers. The Azure NVIDIA GPU Driver Extension installs the appropriate NVIDIA CUDA or GRID drivers on an N-series VM. For the NCasT4_v3-series VMs, which are powered by Nvidia Tesla T4 GPUs, you must install Nvidia GPU drivers If you’re still facing issues, you might want to try setting your GPU to WDDM mode via cmd (by default it’s in TCC mode).

    0 comments No comments

  2. YutongTie-MSFT 53,966 Reputation points Moderator
    2024-01-23T17:36:07.7+00:00

    Hello @GenApe

    Thanks for reaching out to us again, I have you have resolved your question, please kindly check if above answer helps. Please let us know if you have any further concerns, please kindly accept Ram's answer if you feel helpful to support the community, thanks a lot.

    Regards, Yutong

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.