How to enable nvidia drivers in Environment / Docker container?

Bautista Puebla 0 Reputation points
2024-08-01T12:56:06.3066667+00:00

Hello,

I am trying to set up a custom environment in the python SDK for GPU training, with TF.

The dockerfile is the following

FROM
WORKDIR
ENV
ENV
ENV
# Create conda environment
COPY
RUN
rm
conda
conda
RUN
RUN
# This is needed for mpi to locate libpython
ENV

However, when running


env=Environment(name='GPUenv',

 build=BuildContext(path="./context"),

description="Environment created from a Docker context.",

 )

ml_client.environments.create_or_update(env_docker_context)


and


import tensorflow as tf

def main():

print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

if __name__ == "__main__":

main()

with a GPU cluster as compute target, I get no errors, but 0 as output on the job (no GPU detected).

On the other hand, if I go on a computer instance's terminal and manually build and run the container with the --gpus all flag, it does detect the GPU.

Is there any way to specify the flag to the container? How do curated envs do it?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,351 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.