how to run tensorflow 2.12 in Azure ML studio on a GPU compute

Peter Tribout 25 Reputation points
2023-06-01T19:30:02.77+00:00

When I try to "model.fit(...)" with Tensorflow 2.12 on a GPU compute in Azure ML studio I get errors related to the CUDA drivers(NVIDIA).

The provided kernel "Python 38 Tensorflow Pytorch" has:

  • tensorflow 2.12 - > OK
  • CUDA driver 11.4 (cfr nvidia-smi) -> Not OK -> this needs to be = 12.0

When I run the same notebook on Colab, all is fine !

On stackoverflow they advice to or lower to TF-2.4 (not ok because too low) or upgrade CUDA drivers (I tried but did not succeed)

Are there other GPU compute architecture that do support TF2.12 or any other advice

thx Peter

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,715 questions
0 comments No comments
{count} votes

Accepted answer
  1. Sina Salam 7,206 Reputation points
    2023-06-01T20:01:05.3333333+00:00

    @Peter Tribout

    Welcome to Q&A and thank you for posting your questions here.

    You were asking how to run tensorflow 2.12 in Azure ML studio on a GPU compute, and if there are other GPU compute architecture that do support TF2.12 or any other advice.

    To answer your question: To run Tensorflow 2.12 on a GPU compute in Azure ML studio, you need to make sure that the CUDA driver version is 12.0. The provided kernel “Python 38 Tensorflow Pytorch” has tensorflow 2.12 and CUDA driver 11.4 which is not compatible with Tensorflow 2.12.

    You can follow the instructions provided in this Microsoft Learn article to update the CUDA driver version to 12.0.

    You can also learn how to train and deploy a TensorFlow model using Azure Machine Learning Python SDK v2 in this Microsoft Learn article.

    If you want to learn more about distributed training with Azure Machine Learning SDK (v2) supported frameworks such as TensorFlow, you can check out this Microsoft Learn article.

    About other GPU, I’m not sure about other GPU compute architectures that support TensorFlow 2.12. However, TensorFlow 2.12 can be run on a single GPU with no code changes required.

    You can use tf.config.list_physical_devices('GPU') to confirm that TensorFlow is using the GPU.

    The simplest way to run on multiple GPUs, on one or many machines, is using Distribution Strategies. You can read more through the below links:

    https://www.tensorflow.org/guide/gpu

    https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/rel-22-12.html

    I hope that helps! Let me know if you have any other questions.

    If this answer solves your issue, please vote for it so other community members know that this is a quality answer.

    Regards,

    Sina


0 additional answers

Sort by: Most helpful