Breyta

Deila með


ND MI300X v5-series

Applies to: ✔️ Linux VMs ✔️ Flexible scale sets ✔️ Uniform scale sets

The ND MI300X v5 series virtual machine (VM) is a new flagship addition to the Azure GPU family. It was designed for high-end Deep Learning training and tightly coupled scale-up and scale-out Generative AI and HPC workloads.

The ND MI300X v5 series VM starts with eight AMD Instinct MI300 GPUs and two fourth Gen Intel Xeon Scalable processors for a total 96 physical cores. Each GPU within the VM is then connected to one another via 4th-Gen AMD Infinity Fabric links with 128 GB/s bandwidth per GPU and 896 GB/s aggregate bandwidth.

ND MI300X v5-based deployments can scale up to thousands of GPUs with 3.2 Tb/s of interconnect bandwidth per VM. Each GPU within the VM is provided with its own dedicated, topology-agnostic 400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand connection. These connections are automatically configured between VMs occupying the same virtual machine scale set, and support GPUDirect RDMA.

These instances provide excellent performance for many AI, ML, and analytics tools that support GPU acceleration "out-of-the-box," such as TensorFlow, Pytorch, and other frameworks. Additionally, the scale-out InfiniBand interconnect supports a large set of existing AI and HPC tools that are built on AMD’s ROCm Communication Collectives Library (RCCL) for seamless clustering of GPUs.

Host specifications

Part Quantity
Count Units
Specs
SKU ID, Performance Units, etc.
Processor 96 vCores Intel® Xeon® Scalable (Sapphire Rapids)
Memory 1850 GiB
Local Storage 1 Disk 1000 GiB
Remote Disks 32 Disks 40800 IOPS
612 MBps
Network 8 NICs 80000 Mbps
Accelerators 8 GPUs AMD MI300X 192 GiB
1535 GiB per VM

Feature support

Premium Storage: Supported
Premium Storage caching: Supported
Ultra disk: Supported Learn more about availability, usage, and performance
Live Migration: Not Supported
Memory Preserving Updates: Not Supported
VM Generation Support: Generation 2
Accelerated Networking: Supported
Ephemeral OS Disks: Supported
Infiniband: Supported, GPUDirect RDMA, 8x400 Gigabit NDR
Nested Virtualization: Not Supported

Important

To get started with ND MI300X v5 VMs, refer to HPC Workload Configuration and Optimization for steps including driver and network configuration. Due to increased GPU memory I/O footprint, the ND MI300X v5 requires the use of Generation 2 VMs and marketplace images.

Sizes in series

Size vCPU Memory: GiB Temp storage (SSD) GiB GPU GPU Memory GiB Max data disks Max uncached disk throughput: IOPS/MBps Max network bandwidth Max NICs
Standard_ND96isr_MI300X_v5 96 1850 1000 8 MI300X 192 32 40800/612 80,000 Mbps 8

Size table definitions

  • Storage capacity is shown in units of GiB or 1024^3 bytes. When you compare disks measured in GB (1000^3 bytes) to disks measured in GiB (1024^3) remember that capacity numbers given in GiB may appear smaller. For example, 1023 GiB = 1098.4 GB.

  • Disk throughput is measured in input/output operations per second (IOPS) and MBps where MBps = 10^6 bytes/sec.

  • Data disks can operate in cached or uncached modes. For cached data disk operation, the host cache mode is set to ReadOnly or ReadWrite. For uncached data disk operation, the host cache mode is set to None.

  • To learn how to get the best storage performance for your VMs, see Virtual machine and disk performance.

  • Expected network bandwidth is the maximum aggregated bandwidth allocated per VM type across all NICs, for all destinations. For more information, see Virtual machine network bandwidth.

    Upper limits aren't guaranteed. Limits offer guidance for selecting the right VM type for the intended application. Actual network performance will depend on several factors including network congestion, application loads, and network settings. For information on optimizing network throughput, see Optimize network throughput for Azure virtual machines. To achieve the expected network performance on Linux or Windows, you may need to select a specific version or optimize your VM. For more information, see Bandwidth/Throughput testing (NTTTCP).

Other size information

List of all available sizes: Sizes

Pricing Calculator: Pricing Calculator

Information on Disk Types: Disk Types

Next steps

Learn more about how Azure compute units (ACU) can help you compare compute performance across Azure SKUs.

Check out Azure Dedicated Hosts for physical servers able to host one or more virtual machines assigned to one Azure subscription.

Learn how to Monitor Azure virtual machines.