Redaguoti

Bendrinti naudojant


ND-H100-v5 sizes series

The ND H100 v5 series virtual machine (VM) is a new flagship addition to the Azure GPU family. This series is designed for high-end Deep Learning training and tightly coupled scale-up and scale-out Generative AI and HPC workloads.

The ND H100 v5 series starts with a single VM and eight NVIDIA H100 Tensor Core GPUs. ND H100 v5-based deployments can scale up to thousands of GPUs with 3.2 Tbps of interconnect bandwidth per VM. Each GPU within the VM is provided with its own dedicated, topology-agnostic 400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand connection. These connections are automatically configured between VMs occupying the same virtual machine scale set, and support GPU Direct RDMA.

Each GPU features NVLINK 4.0 connectivity for communication within the VM, and the instance has 96 physical fourth Gen Intel Xeon Scalable processor cores.

These instances provide excellent performance for many AI, ML, and analytics tools that support GPU acceleration ‘out-of-the-box,’ such as TensorFlow, Pytorch, Caffe, RAPIDS, and other frameworks. Additionally, the scale-out InfiniBand interconnect supports a large set of existing AI and HPC tools that are built on NVIDIA’s NCCL communication libraries for seamless clustering of GPUs.

Host specifications

Part Quantity
Count Units
Specs
SKU ID, Performance Units, etc.
Processor 96 vCPUs Intel Xeon (Sapphire Rapids) [x86-64]
Memory 1900 GiB
Local Storage 1 Disk 28000 GiB
Remote Storage 32Disks
Network 8 NICs
Accelerators 8 GPUs Nvidia H100 GPU (80GB)

Feature support

Premium Storage: Supported
Premium Storage caching: Supported
Live Migration: Not Supported
Memory Preserving Updates: Not Supported
Generation 2 VMs: Supported
Generation 1 VMs: Not Supported
Accelerated Networking: Supported
Ephemeral OS Disk: Supported
Nested Virtualization: Not Supported
Infiniband: Supported

Important

To get started with ND H100 v5 VMs, refer to HPC Workload Configuration and Optimization for steps including driver and network configuration. Due to increased GPU memory I/O footprint, ND H100 v5 requires the use of Generation 2 VMs and marketplace images.

Azure supports Ubuntu 20.04/22.04, RHEL 7.9/8.7/9.3, AlmaLinux 8.8/9.2, and SLES 15 for ND H100 v5 VMs. Currently, Ubuntu-HPC 20.4/22.04 and AlmaLinux-HPC 8.6/8.7 VM images are supported.

There are offerings of optimized and pre-configured Linux VM images for HPC/AI workloads with a variety of HPC tools and libraries installed, and thus they are highly recommended.

To download an image, go to Azure Marketplace.

Sizes in series

vCPUs (Qty.) and Memory for each size

Size Name vCPUs (Qty.) Memory (GB)
Standard_ND96isr_H100_v5 96 1900

VM Basics resources

Other size information

List of all available sizes: Sizes

Pricing Calculator: Pricing Calculator

Information on Disk Types: Disk Types

Next steps

Learn more about how Azure compute units (ACU) can help you compare compute performance across Azure SKUs.

Check out Azure Dedicated Hosts for physical servers able to host one or more virtual machines assigned to one Azure subscription.

Learn how to Monitor Azure virtual machines.