Deploy Engys ELEMENTS on an Azure virtual machine

Azure Virtual Machines
Azure Virtual Network
Azure CycleCloud

Caution

This article references CentOS, a Linux distribution that is nearing End Of Life (EOL) status. Please consider your use and plan accordingly. For more information, see the CentOS End Of Life guidance.

This article describes the steps for running Engys ELEMENTS on a virtual machine (VM) and on an HPC cluster on Azure. It also presents the performance results of running ELEMENTS on single-node and multi-node VM configurations.

ELEMENTS is a computational fluid dynamics (CFD) and optimization solution for vehicle design applications. The simulation engine that's provided with ELEMENTS is powered by HELYX. The resulting solution combines automotive engineering design practices with open-source CFD and optimization methods developed by Engys.

ELEMENTS is used to solve flow-related problems that are encountered in automotive design, including external vehicle aerodynamics, UHTM, HVAC and cabin comfort, aeroacoustics, powertrain, ICE, water management, and soiling. ELEMENTS is also used to analyze the aerodynamics of other vehicles, like high-speed trains, motorcycles, and competition bicycles.

Why deploy ELEMENTS on Azure?

  • Modern and diverse compute options to meet your workload's needs
  • The flexibility of virtualization without the need to buy and maintain physical hardware
  • Rapid provisioning
  • Complex problems solved within a few hours

Architecture

Multi-node configuration:

Diagram that shows an architecture for running ELEMENTS in a multi-node configuration.

Single-node configuration:

Diagram that shows an architecture for running ELEMENTS in a single-node configuration.

Download a Visio file of all diagrams in this article.

Components

Compute sizing and drivers

Performance tests of ELEMENTS on Azure used HBv3 AMD EPYC 7V73X (Milan-X) VMs running Linux CentOS. The following table provides details about HBv3-series VMs.

VM size vCPU Memory (GiB) Memory bandwidth GBps Base CPU frequency (GHz) All-cores frequency (GHz, peak) Single-core frequency (GHz, peak) RDMA performance (Gbps) Maximum data disks
Standard_HB120rs_v3 120 448 350 1.9 3.0 3.5 200 32
Standard_HB120-96rs_v3 96 448 350 1.9 3.0 3.5 200 32
Standard_HB120-64rs_v3 64 448 350 1.9 3.0 3.5 200 32
Standard_HB120-32rs_v3 32 448 350 1.9 3.0 3.5 200 32
Standard_HB120-16rs_v3 16 448 350 1.9 3.0 3.5 200 32

Required drivers

To use AMD CPUs on HBv3 VMs, you need to install AMD drivers.

To use InfiniBand, you need to enable InfiniBand drivers.

Install ELEMENTS 3.5.0 on a VM or HPC cluster

You need to buy ELEMENTS from Engys or one of its local authorized distributors or agents to get the installation files and technical support. For information about buying ELEMENTS, contact Engys.

Before you install ELEMENTS, you need to deploy and connect a VM or HPC cluster.

For information about deploying the VM and installing the drivers, see Run a Linux VM on Azure.

For information about deploying the Azure CycleCloud and HPC cluster, see these articles:

ELEMENTS 3.5.0 performance results

Two vehicle models were used to test the parallel scalability performance of ELEMENTS 3.5.0 on Azure:

  • The DrivAer sedan model (mid-size computational grid) for external vehicle aerodynamics
  • The Generic Truck Utility (GTU) model (large computational grid) for external vehicle aerodynamics

The hex-dominant meshing utility that's provided with ELEMENTS was used to create all computational grids. They were created in parallel as part of the execution process.

The details of each test model are provided in the following sections.

Model 1: Automotive_DESdrivAer

Screenshot that shows the DrivAer model.

The following table provides details about the model.

Mesh size Solver Transient
17,000,000 cells DES (helyxAero) 400 time steps

Model 2: Automotive_GTU-0001

Screenshot that shows the GTU model.

The following table provides details about the model.

Mesh size Solver Transient
116,000,000 cells DES (helyxAero) 3,583 time steps

ELEMENTS 3.5.0 performance results on single-node VMs

The following section provides the performance results of running ELEMENTS in parallel on single-node Azure HBv3 AMD EPYC 7V73X (Milan-X) VMs. You can use these results as a baseline for comparison with multi-node runs. Only the DrivAer model was tested in single-node configurations.

Model 1: Automotive_DESdrivAer

Number of cores Solver running time (seconds) Relative speed increase
16 3,102.28 1.00
32 1,938.16 1.60
64 1,395.36 2.22
96 1,337.25 2.32
120 1,318.55 2.35

The following graph shows the relative speed increases as the number of CPUs increases:

Graph that shows the relative speed increases on a single-node configuration.

Notes about single-node tests

For all single-node tests, the solver running time on a Standard_HB120-16rs_v3 VM (16 cores) is used as a reference to calculate the relative speed increase with respect to similar VMs that have more cores. The previously presented results for the DrivAer model show that parallel performance in ELEMENTS improves as cores increase from 16 to 64. Above 64 cores, no further improvement occurs. This pattern is common with CFD solvers and other memory-intensive applications because of saturation of the onboard memory that's available on each processor.

The AMD EPYC 7V73-series processor (Milan-X) in the HBv3 VMs tested here is a powerful processor, with 768 MB of total L3 cache. The single-node tests confirm that this memory is sufficient to guarantee parallel scalability of the ELEMENTS solvers when you use half the cores available on each 7V73-series chip.

ELEMENTS 3.5.0 performance results on multi-node clusters

The single-node tests confirm that the solver achieves good parallel performance until you reach 64 cores with HBv3 VMs. Based on those results, only 64-core configurations on Standard_HB120-64rs_v3 were used to evaluate the performance of ELEMENTS on multi-node clusters. The following sections provide the test results.

Model 1: Automotive_DESdrivAer

Number of nodes Number of cores Cells per core Solver running time (seconds) Relative speed increase
1 64 265,625 1,370.81 1.00
2 128 132,813 630.86 2.17
4 256 66,406 351.83 3.90
8 512 33,203 206.36 6.64
16 1,024 16,602 168.07 8.16

The following graph shows the relative speed increases as the number of nodes increases:

Graph that shows the relative speed increases on a multi-node configuration.

Model 2: Automotive_GTU-0001

Number of nodes Number of cores Cells per core Solver running time (seconds) Relative speed increase
1 64 1,812,500 102,740.23 1.00
2 128 906,250 47,761.85 2.15
4 256 453,125 21,484.47 4.78
8 512 226,563 9,595.72 10.71
16 1,024 113,281 5,125.38 20.05

The following graph shows the relative speed increases as the number of nodes increases:

Graph that shows the relative speed increases for the GTU model on a multi-node configuration.

Notes about multi-node tests

The multi-node performance tests for the DrivAer model (mid-size mesh) show that the parallel performance improves as the number of nodes increases but is less than optimal. This suboptimal performance can be explained by the relatively low number of cells per core in 8-node and 16-node configurations. Solver performance is known to be reduced by excessive data communication between processor boundaries when the number of cells per core is low.

In contrast, the results for the GTU model (large mesh) show that solver scalability is above optimal. The number of cells per core in this case never drops below 100,000, even in 16-node configurations. These results are encouraging because most real-life CFD external vehicle aerodynamic models have 100 million cells or more.

Azure cost

Only solver running time is considered for these cost calculations. Meshing times, installation time, and software costs aren't considered.

You can use the Azure pricing calculator to estimate the VM costs for your configurations.

The following tables provide the solver running times in hours. Azure VM hourly rates are subject to change. To compute the cost, multiply the solver time by the number of nodes and the Linux VM hourly cost.

Cost for model 1: Automotive_DESdrivAer

Number of nodes Solver running time (hours)
1 0.458
2 0.256
4 0.168
8 0.148
16 0.162

Cost for model 2: Automotive_GTU-0001

Number of nodes Solver running time (hours)
1 28.888
2 13.622
4 6.249
8 2.990
16 1.761

Summary

  • ELEMENTS 3.5.0 was successfully tested on HBv3 standalone VMs and on an Azure CycleCloud multi-node configuration.
  • All external vehicle aerodynamics models that were tested demonstrate good CPU acceleration when running in multi-node configurations.
  • The meshing, setup, and solver applications in ELEMENTS can be run in parallel, which makes it ideal for running in multi-node configurations. (There's no need for mesh decomposition and reconstruction.)
  • The simulation engine that's provided with ELEMENTS is open source, so you can run as many simulations in as many processors as you need, without incurring additional license costs. This capability is particularly useful when you're performing DES-type external aerodynamic calculations.
  • For better parallel performance when you run DES-type calculations by using ELEMENTS, we recommend that you use 64 cores per HBv3 node and a minimum of 50,000 cells per core.

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal authors:

Other contributors:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps