'ND' sub-family GPU accelerated virtual machine size series
Applies to: ✔️ Linux VMs ✔️ Windows VMs ✔️ Flexible scale sets ✔️ Uniform scale sets
The 'ND' family of VM size series are one of Azure's GPU-accelerated VM instances. They're designed for deep learning, AI research, and high-performance computing tasks that benefit from powerful GPU acceleration. Equipped with NVIDIA GPUs, ND-series VMs offer specialized capabilities for training and inference of complex machine learning models, facilitating faster computations and efficient handling of large datasets. This makes them particularly well-suited for academic and commercial applications in AI development and simulation, where cutting-edge GPU technology is crucial for achieving rapid and accurate results in neural network processing and other computationally intensive tasks.
Workloads and use cases
AI and Deep Learning: ND-family VMs are ideal for training and deploying complex deep learning models. Equipped with powerful NVIDIA GPUs, they provide the computational power necessary for handling extensive neural network training with large datasets, significantly reducing training times.
High-Performance Computing (HPC): ND-family VMs are suitable for HPC applications that require GPU acceleration. Fields such as scientific research, engineering simulations (for example, computational fluid dynamics), and genomic processing can benefit from the high-throughput computing capabilities of ND-series VMs.
Series in family
ND-series V1
The ND-series virtual machines are a new addition to the GPU family designed for AI, and Deep Learning workloads. They offer excellent performance for training and inference. ND instances are powered by NVIDIA Tesla P40 GPUs and Intel Xeon E5-2690 v4 (Broadwell) CPUs. These instances provide excellent performance for single-precision floating point operations, for AI workloads utilizing Microsoft Cognitive Toolkit, TensorFlow, Caffe, and other frameworks. The ND-series also offers a much larger GPU memory size (24 GB), enabling to fit much larger neural net models. Like the NC-series, the ND-series offers a configuration with a secondary low-latency, high-throughput network through RDMA, and InfiniBand connectivity so you can run large-scale training jobs spanning many GPUs.
Part | Quantity Count Units |
Specs SKU ID, Performance Units, etc. |
---|---|---|
Processor | 6 - 24 vCPUs | Intel Xeon E5-2690 v4 (Broadwell) [x86-64] |
Memory | 112 - 448 GiB | |
Local Storage | 1 Disk | 736 - 2948 GiB |
Remote Storage | 12 - 32 Disks | 20000 - 80000 IOPS 200 - 800 MBps |
Network | 4 - 8 NICs | |
Accelerators | 1 - 4 GPUs | Nvidia Tesla P40 GPU (24GB) |
NDv2-series
The NDv2-series virtual machine is a new addition to the GPU family designed for the needs of the most demanding GPU-accelerated AI, machine learning, simulation, and HPC workloads.
NDv2 is powered by 8 NVIDIA Tesla V100 NVLINK-connected GPUs, each with 32 GB of GPU memory. Each NDv2 VM also has 40 non-HyperThreaded Intel Xeon Platinum 8168 (Skylake) cores and 672 GiB of system memory.
NDv2 instances provide excellent performance for HPC and AI workloads utilizing CUDA GPU-optimized computation kernels, and the many AI, ML, and analytics tools that support GPU acceleration 'out-of-box,' such as TensorFlow, Pytorch, Caffe, RAPIDS, and other frameworks.
Critically, the NDv2 is built for both computationally intense scale-up (harnessing 8 GPUs per VM) and scale-out (harnessing multiple VMs working together) workloads. The NDv2 series now supports 100-Gigabit InfiniBand EDR backend networking, similar to that available on the HB series of HPC VM, to allow high-performance clustering for parallel scenarios including distributed training for AI and ML. This backend network supports all major InfiniBand protocols, including those employed by NVIDIA’s NCCL2 libraries, allowing for seamless clustering of GPUs.
View the full NDv2-series page
Part | Quantity Count Units |
Specs SKU ID, Performance Units, etc. |
---|---|---|
Processor | 40 vCPUs | Intel Xeon Platinum 8168 (Skylake) [x86-64] |
Memory | 672 GiB | |
Local Storage | 1 Disk | 2948 GiB |
Remote Storage | 32 Disks | 80000 IOPS 800 MBps |
Network | 8 NICs | 24000 Mbps |
Accelerators | None |
ND_A100_v4-series
The ND A100 v4 series virtual machine(VM) is a new flagship addition to the Azure GPU family. These sizes are designed for high-end Deep Learning training and tightly coupled scale-up and scale-out HPC workloads.
The ND A100 v4 series starts with a single VM and eight NVIDIA Ampere A100 40GB Tensor Core GPUs. ND A100 v4-based deployments can scale up to thousands of GPUs with an 1.6 TB/s of interconnect bandwidth per VM. Each GPU within the VM is provided with its own dedicated, topology-agnostic 200 GB/s NVIDIA Mellanox HDR InfiniBand connection. These connections are automatically configured between VMs occupying the same Azure Virtual Machine Scale Set, and support GPU Direct RDMA.
Each GPU features NVLINK 3.0 connectivity for communication within the VM with 96 physical 2nd-generation AMD Epyc™ 7V12 (Rome) CPU cores behind them.
These instances provide excellent performance for many AI, ML, and analytics tools that support GPU acceleration 'out-of-the-box,' such as TensorFlow, Pytorch, Caffe, RAPIDS, and other frameworks. Additionally, the scale-out InfiniBand interconnect supports a large set of existing AI and HPC tools that are built on NVIDIA's NCCL2 communication libraries for seamless clustering of GPUs.
View the full ND_A100_v4-series page.
Part | Quantity Count Units |
Specs SKU ID, Performance Units, etc. |
---|---|---|
Processor | 96 vCPUs | AMD EPYC 7V12 (Rome) [x86-64] |
Memory | 900 GiB | |
Local Storage | 1 Disk | 6000 GiB |
Remote Storage | 32 Disks | 80000 IOPS 800 MBps |
Network | 8 NICs | 24000 Mbps |
Accelerators | 8 GPUs | Nvidia A100 GPU (40GB) |
NDm_A100_v4-series
The NDm A100 v4 series virtual machine(VM) is a new flagship addition to the Azure GPU family. These sizes are designed for high-end Deep Learning training and tightly coupled scale-up and scale-out HPC workloads.
The NDm A100 v4 series starts with a single VM and eight NVIDIA Ampere A100 80GB Tensor Core GPUs. NDm A100 v4-based deployments can scale up to thousands of GPUs with an 1.6 TB/s of interconnect bandwidth per VM. Each GPU within the VM is provided with its own dedicated, topology-agnostic 200 GB/s NVIDIA Mellanox HDR InfiniBand connection. These connections are automatically configured between VMs occupying the same Azure Virtual Machine Scale Set, and support GPU Direct RDMA.
Each GPU features NVLINK 3.0 connectivity for communication within the VM with 96 physical 2nd-generation AMD Epyc™ 7V12 (Rome) CPU cores behind them.
These instances provide excellent performance for many AI, ML, and analytics tools that support GPU acceleration 'out-of-the-box,' such as TensorFlow, Pytorch, Caffe, RAPIDS, and other frameworks. Additionally, the scale-out InfiniBand interconnect supports a large set of existing AI and HPC tools that are built on NVIDIA's NCCL2 communication libraries for seamless clustering of GPUs.
View the full NDm_A100_v4-series page.
Part | Quantity Count Units |
Specs SKU ID, Performance Units, etc. |
---|---|---|
Processor | 96 vCPUs | AMD EPYC 7V12 (Rome) [x86-64] |
Memory | 1900 GiB | |
Local Storage | 1 Disk | 6400 GiB |
Remote Storage | 32 Disks | 80000 IOPS 800 MBps |
Network | 8 NICs | 24000 Mbps |
Accelerators | 8 GPUs | Nvidia A100 GPU (80GB) |
ND_H100_v5-series
The ND H100 v5 series virtual machine (VM) is a new flagship addition to the Azure GPU family. This series is designed for high-end Deep Learning training and tightly coupled scale-up and scale-out Generative AI and HPC workloads.
The ND H100 v5 series starts with a single VM and eight NVIDIA H100 Tensor Core GPUs. ND H100 v5-based deployments can scale up to thousands of GPUs with 3.2 Tbps of interconnect bandwidth per VM. Each GPU within the VM is provided with its own dedicated, topology-agnostic 400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand connection. These connections are automatically configured between VMs occupying the same virtual machine scale set, and support GPU Direct RDMA.
Each GPU features NVLINK 4.0 connectivity for communication within the VM, and the instance has 96 physical fourth Gen Intel Xeon Scalable processor cores.
These instances provide excellent performance for many AI, ML, and analytics tools that support GPU acceleration ‘out-of-the-box,’ such as TensorFlow, Pytorch, Caffe, RAPIDS, and other frameworks. Additionally, the scale-out InfiniBand interconnect supports a large set of existing AI and HPC tools that are built on NVIDIA’s NCCL communication libraries for seamless clustering of GPUs.
View the full ND_H100_v5-series page.
Part | Quantity Count Units |
Specs SKU ID, Performance Units, etc. |
---|---|---|
Processor | 96 vCPUs | Intel Xeon (Sapphire Rapids) [x86-64] |
Memory | 1900 GiB | |
Local Storage | 1 Disk | 28000 GiB |
Remote Storage | 32Disks | |
Network | 8 NICs | |
Accelerators | 8 GPUs | Nvidia H100 GPU (80GB) |
ND_MI300X_v5-series
The ND MI300X v5 series virtual machine (VM) is a new flagship addition to the Azure GPU family. It was designed for high-end Deep Learning training and tightly coupled scale-up and scale-out Generative AI and HPC workloads.
The ND MI300X v5 series VM starts with eight AMD Instinct MI300 GPUs and two fourth Gen Intel Xeon Scalable processors for a total 96 physical cores. Each GPU within the VM is then connected to one another via 4th-Gen AMD Infinity Fabric links with 128 GB/s bandwidth per GPU and 896 GB/s aggregate bandwidth.
ND MI300X v5-based deployments can scale up to thousands of GPUs with 3.2 Tb/s of interconnect bandwidth per VM. Each GPU within the VM is provided with its own dedicated, topology-agnostic 400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand connection. These connections are automatically configured between VMs occupying the same virtual machine scale set, and support GPUDirect RDMA.
These instances provide excellent performance for many AI, ML, and analytics tools that support GPU acceleration "out-of-the-box," such as TensorFlow, Pytorch, and other frameworks. Additionally, the scale-out InfiniBand interconnect supports a large set of existing AI and HPC tools that are built on AMD’s ROCm Communication Collectives Library (RCCL) for seamless clustering of GPUs.
View the full ND_MI300X_v5-series page.
Part | Quantity Count Units |
Specs SKU ID, Performance Units, etc. |
---|---|---|
Processor | 96 vCPUs | Intel Xeon (Sapphire Rapids) [x86-64] |
Memory | 1850 GiB | |
Local Storage | 1 Temp Disk 8 NVMe Disks |
1000 GiB Temp Disk 28000 GiB NVMe Disks |
Remote Storage | 32 Disks | 80000 IOPS 1200 MBps |
Network | 8 NICs | |
Accelerators | 8 GPUs | AMD Instinct MI300X GPU (192GB) |
Previous-generation ND family series
For older sizes, see previous generation sizes.
Other size information
List of all available sizes: Sizes
Pricing Calculator: Pricing Calculator
Information on Disk Types: Disk Types
Next steps
Learn more about how Azure compute units (ACU) can help you compare compute performance across Azure SKUs.
Check out Azure Dedicated Hosts for physical servers able to host one or more virtual machines assigned to one Azure subscription.
Learn how to Monitor Azure virtual machines.