This article presents the performance results of running TUFLOW (heavily parallelized compute) HPC models on an Azure virtual machine (VM).
TUFLOW HPC is an explicit solver for the 2D shallow-water equations (SWEs), including the sub-grid scale eddy viscosity model. It builds on the strength and accuracy of the TUFLOW Classic model. The TUFLOW HPC model provides finite volume TVD shock capturing, adaptive timestepping stability, and GPU acceleration that achieves simulation times that are 10 to 400 times faster than the TUFLOW Classic model.
Architecture
Download a Visio file of this architecture.
Components
Azure Virtual Machines is used to create Windows VMs. For information about deploying a VM and installing drivers, see Windows VMs on Azure.
The TUFLOW HPC application is installed on the operating system disk that's attached to the VM.
Azure Virtual Network is used to create a private network infrastructure in the cloud.
Network security groups are used to restrict access to the VM.
A public IP address connects the internet to the VM.
A premium solid-state drive (SSD) is used as an operating system disk for storage.
Scenario details
TUFLOW HPC is the latest explicit finite volume engine. It can be used to distribute hydrodynamic calculations across multiple cores, specifically GPUs. The accuracy, stability, and speed of TUFLOW HPC combined with features like a quadtree mesh structure and sub-grid sampling make it a powerful 1D/2D hydrodynamic computational engine.
Deploy TUFLOW HPC on Azure to get benefits like:
- Modern and diverse compute options to meet your workload's needs.
- The flexibility of virtualization without the need to buy and maintain physical hardware.
- Rapid provisioning.
- The ability to run the models on multiple GPU cards to increase modeling speeds.
VM and driver requirements
VM size | vCPU | Memory (GiB) | Temp disk (GiB) | GPU | GPU memory (GiB) | Max data disks |
---|---|---|---|---|---|---|
Standard_NC24ads_A100_v4 | 24 | 220 | 64 | 1 | 80 | 12 |
Standard_NC48ads_A100_v4 | 48 | 440 | 128 | 2 | 160 | 24 |
Standard_NC96ads_A100_v4 | 96 | 880 | 256 | 4 | 320 | 32 |
To run TUFLOW HPC benchmarks, you need to:
- Deploy a VM and connect to it.
- Install NVIDIA GPU drivers to take advantage of the GPU capabilities of NC_A100_v4-series VMs.
For information about deploying a VM and installing drivers, see Run a Windows VM on Azure or Run a Linux VM on Azure.
Install TUFLOW HPC and run benchmarks
The standard TUFLOW HPC benchmarking dataset is used in the following benchmarking tests. For more information, see Hardware benchmarking.
Note
The TUFLOW HPC benchmarking dataset is license-free, so you can use it to assess performance on any machine.
To install and run the TUFLOW HPC models:
Download the TUFLOW HPC benchmarking models zip file.
Extract the zip file on a local drive of the computer that you want to test.
Go to the TUFLOW\runs\ folder, and run the Run_Benchmark.bat file.
TUFLOW HPC performance results
For the following performance analysis, the 2018-03-AA version of TUFLOW HPC was run on Windows NC_A100_v4-series VMs. Note that the benchmarking tests only cover the TUFLOW HPC engine, which can distribute simulations across multiple CPU and GPU cards. The tests don't cover TUFLOW Classic models, which are limited to a single CPU or 1D components.
The operating system that was used for testing is Windows 10 Pro x64 version 22H2 G2.
The following table shows the details for each model that was used for testing:
Model | Cell size (m) | Number of cells |
---|---|---|
Model 1 | 20 | 181,981 |
Model 2 | 10 | 727,865 |
Model 3 | 5 | 2,911,472 |
Model 4 | 2.5 | 11,645,341 |
Model 1
Model 1 has a cell size of 20m, or 181,981 2D cells. TUFLOW HPC runs on both CPU and GPU hardware. The following table shows the performance results of running TUFLOW HPC on a NC_A100_v4-series VM compared to running the application on an EPYC 9V33X processor.
Processor/VM series | CPU/GPU | Runtime (secs) | Relative speed increase |
---|---|---|---|
EPYC 9V33X | 8 CPUs | 5,973 | 1.00 |
EPYC 9V33X | 16 CPUs | 3,209 | 1.86 |
Standard NC24ads_A100_v4 | 1 GPU | 152 | 39.30 |
For a performance analysis, the simulation runtime is a key parameter. To calculate the relative speed increase, the 8-vCPU (core) runtime is used as the baseline.
The following graph shows how the relative speed increase improves.
Model 2
Model 2 has a cell size of 10m, or 727,865 2D cells. The following table shows the performance results of running TUFLOW HPC on a NC_A100_v4-series VM compared to running the application on an EPYC 9V33X processor.
Processor/VM series | CPU/GPU | Runtime (secs) | Relative speed increase |
---|---|---|---|
EPYC 9V33X | 8 CPUs | 43,082 | 1.00 |
EPYC 9V33X | 16 CPUs | 25,071 | 1.72 |
Standard NC24ads_A100_v4 | 1 GPU | 808 | 53.32 |
Model 3
Model 3 has a cell size of 5m, or 2,911,472 2D cells. This option runs large models on GPU hardware for high-end GPU benchmarking.
VM configuration | GPUs | Runtime (secs) | Relative speed increase |
---|---|---|---|
Standard NC24ads_A100_v4 | 1 | 6,638 | 1.00 |
Standard NC48ads_A100_v4 | 2 | 4,860 | 1.37 |
Standard NC96ads_A100_v4 | 4 | 4,155 | 1.60 |
Model 4
Model 4 has a cell size of 2.5m, or 11,645,341 2D cells. This model is also for high-end GPU benchmarking.
VM configuration | GPUs | Runtime (secs) | Relative speed increase |
---|---|---|---|
Standard NC24ads_A100_v4 | 1 | 51,797 | 1.00 |
Standard NC48ads_A100_v4 | 2 | 47,496 | 1.09 |
Standard NC96ads_A100_v4 | 4 | 27,469 | 1.89 |
Azure cost
Based on the TUFLOW HPC test results, GPU-based SKUs, like NC_A100_v4-series VMs are more cost effective compared to CPU-based SKUs. You can use the Azure pricing calculator to estimate costs for your configuration.
To compute the total VM cost for your analysis, multiply the total runtime of the VM by the Azure VM hourly cost. For more information, see Windows VM pricing or Linux VM pricing.
Summary
The four models of TUFLOW HPC were successfully tested on a NC_A100_v4 VM on Azure.
The TUFLOW HPC models were tested with a single-precision version. This version requires less time and memory to calculate field data compared to a double-precision version. The memory requirement for a single-precision version is almost 50% less than that of a double-precision version.
We recommend the single-precision version of TUFLOW HPC. Compared to a double-precision version, it's faster and it enables you to run larger models with the available CPU/GPU memory.
TUFLOW HPC scales better with GPUs compared to CPUs. For the 10m model, when 1 GPU was used, the relative speed increase improved by about 53 times compared to the same test with 8 CPUs. These results indicate impressive scaling.
Contributors
This article is maintained by Microsoft. It was originally written by the following contributors.
Principal authors:
- Hari Bagudu | Senior Manager
- Gauhar Junnarkar | Principal Program Manager
- Pavankumar Navalli | HPC Performance Engineer
- Vinod Pamulapati | HPC Performance Engineer
Other contributors:
- Guy Bursell | Director Business Strategy
- Duncan Kitts | TUFLOW UK/Europe Software Lead
- Sachin Rastogi | Manager
- Jaap van der Velde | TUFLOW Associate Principal Software Architect & ICT Consultant
To see non-public LinkedIn profiles, sign in to LinkedIn.
Next steps
- GPU-optimized VM sizes
- Windows Virtual Machines in Azure
- Linux VMs on Azure
- Virtual networks and VMs on Azure
- Learning path: Run high-performance computing (HPC) applications on Azure