'ND' sub-family GPU accelerated virtual machine size series

Artikkeli
08/22/2024

Applies to: ✔️ Linux VMs ✔️ Windows VMs ✔️ Flexible scale sets ✔️ Uniform scale sets

The 'ND' family of VM size series are one of Azure's GPU-accelerated VM instances. They're designed for deep learning, AI research, and high-performance computing tasks that benefit from powerful GPU acceleration. Equipped with NVIDIA GPUs, ND-series VMs offer specialized capabilities for training and inference of complex machine learning models, facilitating faster computations and efficient handling of large datasets. This makes them particularly well-suited for academic and commercial applications in AI development and simulation, where cutting-edge GPU technology is crucial for achieving rapid and accurate results in neural network processing and other computationally intensive tasks.

Workloads and use cases

AI and Deep Learning: ND-family VMs are ideal for training and deploying complex deep learning models. Equipped with powerful NVIDIA GPUs, they provide the computational power necessary for handling extensive neural network training with large datasets, significantly reducing training times.

High-Performance Computing (HPC): ND-family VMs are suitable for HPC applications that require GPU acceleration. Fields such as scientific research, engineering simulations (e.g., computational fluid dynamics), and genomic processing can benefit from the high-throughput computing capabilities of ND-series VMs.

Graphics Rendering: ND-family's GPUs make them a great choice for graphics-intensive tasks, including real-time rendering for animation and video production, as well as high-fidelity simulations for virtual reality environments.

Remote Visualization: ND-family VMs can be used for remote visualization of data-intensive tasks, where high-end GPU capabilities are necessary to process and render complex visualizations over the cloud, facilitating access from less powerful client machines.

Series in family

ND-series V1

The ND-series virtual machines are a new addition to the GPU family designed for AI, and Deep Learning workloads. They offer excellent performance for training and inference. ND instances are powered by NVIDIA Tesla P40 GPUs and Intel Xeon E5-2690 v4 (Broadwell) CPUs. These instances provide excellent performance for single-precision floating point operations, for AI workloads utilizing Microsoft Cognitive Toolkit, TensorFlow, Caffe, and other frameworks. The ND-series also offers a much larger GPU memory size (24 GB), enabling to fit much larger neural net models. Like the NC-series, the ND-series offers a configuration with a secondary low-latency, high-throughput network through RDMA, and InfiniBand connectivity so you can run large-scale training jobs spanning many GPUs.

View the full ND-series page.

Part	Quantity ^{Count Units}	Specs ^{SKU ID, Performance Units, etc.}
Processor	6 - 24 vCPUs	Intel Xeon E5-2690 v4 (Broadwell) [x86-64]
Memory	112 - 448 GiB
Local Storage	1 Disk	736 - 2948 GiB
Remote Storage	12 - 32 Disks	20000 - 80000 IOPS 200 - 800 MBps
Network	4 - 8 NICs
Accelerators	1 - 4 GPUs	Nvidia Tesla P40 GPU (24GB)

NDv2-series

The NDv2-series virtual machine is a new addition to the GPU family designed for the needs of the most demanding GPU-accelerated AI, machine learning, simulation, and HPC workloads.

NDv2 is powered by 8 NVIDIA Tesla V100 NVLINK-connected GPUs, each with 32 GB of GPU memory. Each NDv2 VM also has 40 non-HyperThreaded Intel Xeon Platinum 8168 (Skylake) cores and 672 GiB of system memory.

NDv2 instances provide excellent performance for HPC and AI workloads utilizing CUDA GPU-optimized computation kernels, and the many AI, ML, and analytics tools that support GPU acceleration 'out-of-box,' such as TensorFlow, Pytorch, Caffe, RAPIDS, and other frameworks.

Critically, the NDv2 is built for both computationally intense scale-up (harnessing 8 GPUs per VM) and scale-out (harnessing multiple VMs working together) workloads. The NDv2 series now supports 100-Gigabit InfiniBand EDR backend networking, similar to that available on the HB series of HPC VM, to allow high-performance clustering for parallel scenarios including distributed training for AI and ML. This backend network supports all major InfiniBand protocols, including those employed by NVIDIA’s NCCL2 libraries, allowing for seamless clustering of GPUs.

View the full NDv2-series page

Part	Quantity ^{Count Units}	Specs ^{SKU ID, Performance Units, etc.}
Processor	40 vCPUs	Intel Xeon Platinum 8168 (Skylake) [x86-64]
Memory	672 GiB
Local Storage	1 Disk	2948 GiB
Remote Storage	32 Disks	80000 IOPS 800 MBps
Network	8 NICs	24000 Mbps
Accelerators	None

ND_A100_v4-series

The ND A100 v4 series virtual machine(VM) is a new flagship addition to the Azure GPU family. These sizes are designed for high-end Deep Learning training and tightly coupled scale-up and scale-out HPC workloads.

The ND A100 v4 series starts with a single VM and eight NVIDIA Ampere A100 40GB Tensor Core GPUs. ND A100 v4-based deployments can scale up to thousands of GPUs with an 1.6 TB/s of interconnect bandwidth per VM. Each GPU within the VM is provided with its own dedicated, topology-agnostic 200 GB/s NVIDIA Mellanox HDR InfiniBand connection. These connections are automatically configured between VMs occupying the same Azure Virtual Machine Scale Set, and support GPU Direct RDMA.

Each GPU features NVLINK 3.0 connectivity for communication within the VM with 96 physical 2nd-generation AMD Epyc™ 7V12 (Rome) CPU cores behind them.

These instances provide excellent performance for many AI, ML, and analytics tools that support GPU acceleration 'out-of-the-box,' such as TensorFlow, Pytorch, Caffe, RAPIDS, and other frameworks. Additionally, the scale-out InfiniBand interconnect supports a large set of existing AI and HPC tools that are built on NVIDIA's NCCL2 communication libraries for seamless clustering of GPUs.

View the full ND_A100_v4-series page.

Part	Quantity ^{Count Units}	Specs ^{SKU ID, Performance Units, etc.}
Processor	96 vCPUs	AMD EPYC 7V12 (Rome) [x86-64]
Memory	900 GiB
Local Storage	1 Disk	6000 GiB
Remote Storage	32 Disks	80000 IOPS 800 MBps
Network	8 NICs	24000 Mbps
Accelerators	8 GPUs	Nvidia A100 GPU (40GB)

NDm_A100_v4-series

The NDm A100 v4 series virtual machine(VM) is a new flagship addition to the Azure GPU family. These sizes are designed for high-end Deep Learning training and tightly coupled scale-up and scale-out HPC workloads.

The NDm A100 v4 series starts with a single VM and eight NVIDIA Ampere A100 80GB Tensor Core GPUs. NDm A100 v4-based deployments can scale up to thousands of GPUs with an 1.6 TB/s of interconnect bandwidth per VM. Each GPU within the VM is provided with its own dedicated, topology-agnostic 200 GB/s NVIDIA Mellanox HDR InfiniBand connection. These connections are automatically configured between VMs occupying the same Azure Virtual Machine Scale Set, and support GPU Direct RDMA.

Each GPU features NVLINK 3.0 connectivity for communication within the VM with 96 physical 2nd-generation AMD Epyc™ 7V12 (Rome) CPU cores behind them.

View the full NDm_A100_v4-series page.

Part	Quantity ^{Count Units}	Specs ^{SKU ID, Performance Units, etc.}
Processor	96 vCPUs	AMD EPYC 7V12 (Rome) [x86-64]
Memory	1900 GiB
Local Storage	1 Disk	6400 GiB
Remote Storage	32 Disks	80000 IOPS 800 MBps
Network	8 NICs	24000 Mbps
Accelerators	8 GPUs	Nvidia A100 GPU (80GB)

ND_H100_v5-series

The ND H100 v5 series virtual machine (VM) is a new flagship addition to the Azure GPU family. This series is designed for high-end Deep Learning training and tightly coupled scale-up and scale-out Generative AI and HPC workloads.

The ND H100 v5 series starts with a single VM and eight NVIDIA H100 Tensor Core GPUs. ND H100 v5-based deployments can scale up to thousands of GPUs with 3.2 Tbps of interconnect bandwidth per VM. Each GPU within the VM is provided with its own dedicated, topology-agnostic 400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand connection. These connections are automatically configured between VMs occupying the same virtual machine scale set, and support GPU Direct RDMA.

Each GPU features NVLINK 4.0 connectivity for communication within the VM, and the instance has 96 physical fourth Gen Intel Xeon Scalable processor cores.

These instances provide excellent performance for many AI, ML, and analytics tools that support GPU acceleration ‘out-of-the-box,’ such as TensorFlow, Pytorch, Caffe, RAPIDS, and other frameworks. Additionally, the scale-out InfiniBand interconnect supports a large set of existing AI and HPC tools that are built on NVIDIA’s NCCL communication libraries for seamless clustering of GPUs.

View the full ND_H100_v5-series page.

Part	Quantity ^{Count Units}	Specs ^{SKU ID, Performance Units, etc.}
Processor	96 vCPUs	Intel Xeon (Sapphire Rapids) [x86-64]
Memory	1900 GiB
Local Storage	1 Disk	28000 GiB
Remote Storage	32Disks
Network	8 NICs
Accelerators	8 GPUs	Nvidia H100 GPU (80GB)

ND_MI300X_v5-series

The ND MI300X v5 series virtual machine (VM) is a new flagship addition to the Azure GPU family. It was designed for high-end Deep Learning training and tightly coupled scale-up and scale-out Generative AI and HPC workloads.

The ND MI300X v5 series VM starts with eight AMD Instinct MI300 GPUs and two fourth Gen Intel Xeon Scalable processors for a total 96 physical cores. Each GPU within the VM is then connected to one another via 4th-Gen AMD Infinity Fabric links with 128 GB/s bandwidth per GPU and 896 GB/s aggregate bandwidth.

ND MI300X v5-based deployments can scale up to thousands of GPUs with 3.2 Tb/s of interconnect bandwidth per VM. Each GPU within the VM is provided with its own dedicated, topology-agnostic 400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand connection. These connections are automatically configured between VMs occupying the same virtual machine scale set, and support GPUDirect RDMA.

These instances provide excellent performance for many AI, ML, and analytics tools that support GPU acceleration "out-of-the-box," such as TensorFlow, Pytorch, and other frameworks. Additionally, the scale-out InfiniBand interconnect supports a large set of existing AI and HPC tools that are built on AMD’s ROCm Communication Collectives Library (RCCL) for seamless clustering of GPUs.

View the full ND_MI300X_v5-series page.

Part	Quantity ^{Count Units}	Specs ^{SKU ID, Performance Units, etc.}
Processor	96 vCPUs	Intel Xeon (Sapphire Rapids) [x86-64]
Memory	1850 GiB
Local Storage	1 Temp Disk 8 NVMe Disks	1000 GiB Temp Disk 28000 GiB NVMe Disks
Remote Storage	32 Disks	80000 IOPS 1200 MBps
Network	8 NICs
Accelerators	8 GPUs	AMD Instinct MI300X GPU (192GB)

Previous-generation ND family series

For older sizes, see previous generation sizes.

Other size information

List of all available sizes: Sizes

Pricing Calculator: Pricing Calculator

Information on Disk Types: Disk Types

Next steps

Learn more about how Azure compute units (ACU) can help you compare compute performance across Azure SKUs.

Check out Azure Dedicated Hosts for physical servers able to host one or more virtual machines assigned to one Azure subscription.

Learn how to Monitor Azure virtual machines.

Jaa