Is Azure supporting distributed GPU?

nam 41 Reputation points

Is there any plan? Any date we can expect?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,095 questions
0 comments No comments
{count} votes

Accepted answer
  1. YutongTie-MSFT 36,976 Reputation points

    Hello @nam

    I hope yo are doing well. We have multiple options for Distributed GPU for Azure Machine Learnig for SDK v1 as below -
    Message Passing Interface (MPI)
    Environment variables from Open MPI
    Process group initialization
    Launch options
    DistributedDataParallel (per-process-launch)
    Using torch.distributed.launch (per-node-launch)
    PyTorch Lightning
    Hugging Face Transformers
    Environment variables for TensorFlow (TF_CONFIG)
    Accelerate GPU training with InfiniBand

    For V2 there should be big change. Please feel free to let us know any problems. Thanks.


    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. JimmySalian-2011 36,681 Reputation points

    Hi Nam,

    Do you mean by distributed GPU for ML or specific to Azure ? Here is the link related to Azure GPU - how-to-train-distributed-gpu

    If it is something else, please reply so I can have a look.

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

    0 comments No comments