Is Azure supporting distributed GPU?

nam 41 Reputation points
2022-08-31T20:19:54.51+00:00

Is there any plan? Any date we can expect?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,552 questions
0 comments No comments
{count} votes

Accepted answer
  1. YutongTie-MSFT 46,091 Reputation points
    2022-09-05T02:34:31.897+00:00

    Hello @nam

    I hope yo are doing well. We have multiple options for Distributed GPU for Azure Machine Learnig for SDK v1 as below -
    Message Passing Interface (MPI)
    Horovod
    DeepSpeed
    Environment variables from Open MPI
    PyTorch
    Process group initialization
    Launch options
    DistributedDataParallel (per-process-launch)
    Using torch.distributed.launch (per-node-launch)
    PyTorch Lightning
    Hugging Face Transformers
    TensorFlow
    Environment variables for TensorFlow (TF_CONFIG)
    Accelerate GPU training with InfiniBand

    For V2 there should be big change. Please feel free to let us know any problems. Thanks.

    Regards,
    Yutong

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. JimmySalian-2011 41,916 Reputation points
    2022-08-31T20:32:46.857+00:00

    Hi Nam,

    Do you mean by distributed GPU for ML or specific to Azure ? Here is the link related to Azure GPU - how-to-train-distributed-gpu

    If it is something else, please reply so I can have a look.

    ==
    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

    0 comments No comments