Is Azure supporting distributed GPU?

nam 41 Reputation points

Is there any plan? Any date we can expect?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
1,715 questions
No comments
{count} votes

Accepted answer
  1. YutongTie-MSFT 27,756 Reputation points

    Hello @nam

    I hope yo are doing well. We have multiple options for Distributed GPU for Azure Machine Learnig for SDK v1 as below -
    Message Passing Interface (MPI)
    Environment variables from Open MPI
    Process group initialization
    Launch options
    DistributedDataParallel (per-process-launch)
    Using torch.distributed.launch (per-node-launch)
    PyTorch Lightning
    Hugging Face Transformers
    Environment variables for TensorFlow (TF_CONFIG)
    Accelerate GPU training with InfiniBand

    For V2 there should be big change. Please feel free to let us know any problems. Thanks.


1 additional answer

Sort by: Most helpful
  1. JimmySalian-2011 30,066 Reputation points

    Hi Nam,

    Do you mean by distributed GPU for ML or specific to Azure ? Here is the link related to Azure GPU - how-to-train-distributed-gpu

    If it is something else, please reply so I can have a look.

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.