ACI Networking Questions

Runsheng Guo 1 Reputation point
2021-05-07T05:26:47.157+00:00

Hi I had a couple questions about networking in ACI:

  1. What kind of bandwidth can I expect In ACI? I ran a speed test and got roughly 1300/1100 Mbps for download and upload seed for a CPU instance and 900/500 Mbps for a GPU (K80) instance.
  2. Is this inline with what I should be expecting, and if so, why is the bandwidth lower for the GPU instance? Is there any way to increase the bandwidth?
  3. For GPU instances, is using Mellanox Infiniband and NVLink supported for inter and intra container group communication?

Thanks!

Azure Container Instances
Azure Container Instances
An Azure service that provides customers with a serverless container experience.
757 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. vipullag-MSFT 26,487 Reputation points Moderator
    2021-05-07T10:46:38.6+00:00

    @Runsheng Guo

    The network throughput of the ACI depends on the network throughput of the node VM on which it is scheduled.
    However, since the underlying infrastructure is abstracted (for non GPU enabled Container groups) the exact sku of VM should be considered as non-deterministic. But ACI runs on sets of Azure VMs of various SKUs, primarily from the F and the D series. We expect this to change in the future as we continue to develop and optimize the service. Refer this FAQ document for more details.

    For GPU enabled container groups, from this document, this will help narrow down the network performance based on the VM SKU.

    N-series VMs communicate over the low latency and high bandwidth InfiniBand network, please refer this.

    NVlink Interconnect is currently not supported, please refer below links:
    https://learn.microsoft.com/en-us/azure/virtual-machines/nc-series
    https://learn.microsoft.com/en-us/azure/virtual-machines/ncv2-series
    https://learn.microsoft.com/en-us/azure/virtual-machines/ncv3-series

    Hope this helps.

    Please 'Accept as answer' if the provided information is helpful, so that it can help others in the community looking for help on similar topics.

    1 person found this answer helpful.

  2. vipullag-MSFT 26,487 Reputation points Moderator
    2021-05-11T13:56:35.007+00:00

    On InfiniBand (IB) enabled VMs, the appropriate drivers are required to enable RDMA.

    • The CentOS-HPC VM images in the Marketplace come pre-configured with the appropriate IB drivers.
    • The Ubuntu-HPC VM images in the Marketplace come pre-configured with the appropriate IB drivers and GPU drivers.

    These VM images (VMI) are based on the base CentOS and Ubuntu marketplace VM images. Scripts used in the creation of these VM images from their base CentOS Marketplace image are on the azhpc-images repo.

    On GPU enabled N-series VMs, the appropriate GPU drivers are additionally required. This can be available by the following methods:

    • Use the Ubuntu-HPC VM images which come pre-configured with the Nvidia GPU drivers and GPU compute software stack (CUDA, NCCL).
    • Add the GPU drivers through the VM extensions
    • Install the GPU drivers manually.
    • Some other VM images on the Marketplace also come pre-installed with the Nvidia GPU drivers, including some VM images from Nvidia.

    However, since the container group is an isolated group of processes it will not have permissions on other Guest OS process like waagent. As Azure Container Instance is a Container as a Service offering the guest OS of the underlying Virtual Machine is abstracted and the restart of the Windows Azure Agent process cannot be operated from the guest OS level either. One suggestion would be running docker containers on Azure GPU enabled VMs. Enabling IPoIB should be as mentioned here and the container can be created as detailed here with necessary use case changes to the base image Dockerfile.

    Hope this helps.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.