HX-series virtual machine performance

Applies to: ✔️ Linux VMs ✔️ Windows VMs ✔️ Flexible scale sets ✔️ Uniform scale sets

Performance expectations using common HPC microbenchmarks are as follows:

Workload HX
STREAM Triad 750-780GB/s of DDR5, up to 5.7 TB/s of 3D-V Cache bandwidth
High-Performance Linpack (HPL) Up to 7.6 TF (Rpeak, FP64) for 144-core VM size
RDMA latency & bandwidth < 2 microseconds (1 byte), 400 Gb/s (one-way)
FIO on local NVMe SSDs (RAID0) 12 GB/s reads, 7 GB/s writes; 186k IOPS reads, 201k IOPS writes

Memory bandwidth test

The STREAM memory test can be run using the scripts in this GitHub repository.

git clone https://github.com/Azure/woc-benchmarking 
cd woc-benchmarking/apps/hpc/stream/ 
sh build_stream.sh 
sh stream_run_script.sh $PWD “hbrs_v4” 

Compute performance test

The HPL benchmark can be run using the script in this GitHub repository.

git clone https://github.com/Azure/woc-benchmarking 
cd woc-benchmarking/apps/hpc/hpl 
sh hpl_build_script.sh 
sh hpl_run_scr_hbv4.sh $PWD 

MPI latency

The MPI latency test from the OSU microbenchmark suite can be executed as shown. Sample scripts are on GitHub.

module load mpi/hpcx 
mpirun -np 2 --host $src,$dst --map-by node -x LD_LIBRARY_PATH $HPCX_OSU_DIR/osu_latency

MPI bandwidth

The MPI bandwidth test from the OSU microbenchmark suite can be executed as shown. Sample scripts are on GitHub.

module load mpi/hpcx 
mpirun -np 2 --host $src,$dst --map-by node -x LD_LIBRARY_PATH $HPCX_OSU_DIR/osu_bw

[!NOTE] Define source(src) and destination(dst).

Mellanox Perftest

The Mellanox Perftest package has many InfiniBand tests such as latency (ib_send_lat) and bandwidth (ib_send_bw). An example command is shown.

numactl --physcpubind=[INSERT CORE #]  ib_send_lat -a

[!NOTE] NUMA node affinity for InfiniBand NIC is NUMA0.

Next steps