This article briefly describes the steps for running Altair ultraFluidX on a virtual machine (VM) that's deployed on Azure. It also presents the performance results of running ultraFluidX on Azure.
Altair ultraFluidX is a simulation tool for predicting the aerodynamic properties of passenger and heavy-duty vehicles, and for the evaluation of building and environmental aerodynamics. Altair ultraFluidX:
- Is based on Lattice Boltzmann methods (LBM).
- Is optimized for GPUs and supports CUDA-aware MPI for multi-GPU usage.
- Provides an LBM-consistent Smagorinsky LES turbulence model, TBLE-based wall modeling, and porous media model (pressure drop) for simulation of multiple heat exchangers.
- Handles rotating geometries via wall-velocity boundary conditions, a Moving Reference Frame (MRF) model, and truly rotating overset grids (OSM).
- Provides automated volume mesh generation with low surface mesh requirements, local grid refinement, and support for intersecting/baffle parts.
Altair ultraFluidX is used in the automotive, building, facilities, energy, and environmental industries.
Why deploy ultraFluidX on Azure?
- Modern and diverse compute options to align with your workload's needs
- The flexibility of virtualization without the need to buy and maintain physical hardware
- Rapid provisioning
- Complex problems solved within a few hours
Architecture
Download a Visio file of this architecture.
Components
- Azure Virtual Machines is
used to create a Linux VM.
- For information about deploying the VM and installing the drivers, see Linux VMs on Azure.
- Azure Virtual Network is
used to create a private network infrastructure in the cloud.
- Network security groups are used to restrict access to the VM.
- A public IP address connects the internet to the VM.
- A physical solid-state drive (SSD) is used for storage.
Compute sizing and drivers
Performance tests of ultraFluidX on Azure used ND A100 v4 series VMs running Linux. The following table provides the configuration details.
VM size | vCPU | Memory, in GiB | SSD, in GiB | GPUs | GPU memory, in GiB | Maximum data disks |
---|---|---|---|---|---|---|
Standard_ND96asr_v4 | 96 | 900 | 6,000 | 8 A100 | 40 | 32 |
The Standard_ND96asr_v4 VM runs NVIDIA Ampere A100 Tensor Core GPUs and is supported by 96 AMD processor cores.
Required drivers
To use ultraFluidX on Standard_ND96asr_v4 VMs as described in this article, you need to install NVIDIA and AMD drivers.
ultraFluidX installation
Before you install ultraFluidX, you need to deploy and connect a Linux VM and install the required NVIDIA and AMD drivers.
Important
NVIDIA Fabric Manager installation is required for VMs that use NVLink or NVSwitch. Standard_ND96asr_v4 uses NVLink.
For information about deploying the VM and installing the drivers, see Run a Linux VM on Azure.
Altair ultraFluidX only runs on Linux. You can download ultraFluidX from Altair One Marketplace. You also need to install Altair License Manager and activate your license via Altair Units Licensing. For more information, see the Altair Units Licensing document on Altair One Marketplace.
ultraFluidX performance results
The Roadster and CX1 models were used as test cases. This image shows the roadster model:
This image shows the CX1 model:
The amount of time it takes to complete the simulation by using GPUs was measured. The Linux platform was used, with an Azure Marketplace CentOS 8.1 HPC Gen2 image. The following table provides details about the operating system and NVIDIA drivers.
Operating system version | OS architecture | GPU driver version | Cuda version |
---|---|---|---|
CentOS Linux release 8.1.1911 (Core) | x86-64 | 470.57.02 | 11.4 |
GPU-based fluid dynamics simulations were run to test ultraFluidX. The simulations were run for shortened test cases, not for full production-level test cases. The projected wall-clock times and computation times for a full production run of the CX1 are provided here. Because the workload per time step is constant, these times can be computed from the computation time of the short run via linear extrapolation.
The total simulation consists of two phases: a mostly CPU-based pre-processing phase (independent of the physical simulation time) and the GPU-based computation phase. The purpose of the simulation is to test the performance of the GPU phase on the chosen VM: Standard_ND96asr_v4.
The following table shows the wall-clock times, in seconds.
Model | 1 GPU | 2 GPUs | 4 GPUs | 8 GPUs |
---|---|---|---|---|
Roadster | 1,571 | 1,097 | 731 | 539 |
CX1 (short run) | NA* | NA* | 6,679 | 4,743 |
CX1 (production run) | NA* | NA* | 39,115 | 23,518 |
This graph provides the same information for the Roadster model and the short run of the CX1 model:
The following table shows the pre-processing times, in seconds.
Model | 1 GPU | 2 GPUs | 4 GPUs | 8 GPUs |
---|---|---|---|---|
Roadster | 679 | 607 | 446 | 350 |
CX1 | NA* | NA* | 4,926 | 3,728 |
The following table shows the computation times, in seconds.
Model | 1 GPU | 2 GPUs | 4 GPUs | 8 GPUs |
---|---|---|---|---|
Roadster | 782 | 433 | 257 | 174 |
CX1 (short run) | NA* | NA* | 1,560 | 903 |
CX1 (production run) | NA* | NA* | 33,996 | 19,678 |
Finally, the following table shows the relative speed increases when the number of GPUs is increased. The speed increases are calculated for the computation time (the phase when GPUs are used) to provide the GPU performance.
Model | 1 GPU | 2 GPUs | 4 GPUs | 8 GPUs |
---|---|---|---|---|
Roadster | 1.00 | 1.81 | 3.04 | 4.49 |
CX1 | NA* | NA* | 1.00 | 1.73 |
* NA indicates that the model requires more than 100 GB of GPU memory, so the simulation can't run with only one or two GPUs.
Here's that information in graphical form:
Azure cost
The following table presents wall-clock times that you can use to calculate Azure costs. You can use the times presented here together with the Azure hourly rates for ND A100 v4-series VMs to calculate costs. For the current hourly costs, see Linux Virtual Machines Pricing.
Only wall-clock time is considered for these cost calculations. Application installation time isn't considered.
You can use the Azure pricing calculator to estimate the costs for your configuration.
Model | Number of GPUs* | Wall-clock time, in seconds |
---|---|---|
Roadster | 1 | 1,571 |
Roadster | 2 | 1,097 |
Roadster | 4 | 731 |
Roadster | 8 | 539 |
CX1 (short run) | 4 | 6,679 |
CX1 (short run) | 8 | 4,743 |
CX1 (production run) | 4 | 39,115 |
CX1 (production run) | 8 | 23,518 |
* The CX1 model requires more than 100 GB of GPU memory, so the simulation can't run with only one or two GPUs.
Summary
- Altair ultraFluidX was successfully tested on ND A100 v4-series VMs on Azure.
- Complex problems can be solved within a few hours on ND A100 v4 VMs.
- Increasing the number of GPUs improves performance.
Contributors
This article is maintained by Microsoft. It was originally written by the following contributors.
Principal authors:
- Hari Bagudu | Senior Manager
- Gauhar Junnarkar | Principal Program Manager
- Vinod Pamulapati | HPC Performance Engineer
Other contributors:
- Mick Alberts | Technical Writer
- Guy Bursell | Director Business Strategy
- Sachin Rastogi | Manager
To see non-public LinkedIn profiles, sign in to LinkedIn.
Next steps
- GPU-optimized virtual machine sizes
- Linux virtual machines on Azure
- Virtual networks and virtual machines on Azure
- Learning path: Run high-performance computing (HPC) applications on Azure