This article briefly describes the steps for running Samadii DEM on a virtual machine (VM) that's deployed on Azure. It also presents the performance results of running Samadii DEM on Azure.
Samadii DEM analyzes and interprets large-scale particles at high speed. It uses discrete element method (DEM), which is a Lagrangian method that determines the movement of particles by using the six-degrees-of-freedom equations of motion, taking into consideration all forces of individual particles. It uses explicit methods for time integration to calculate the position and velocity of the particles in the next time step.
DEM requires significant memory and computing power because of its small time step and the large number of particles that it takes into account. Samadii DEM, which is designed to perform analysis by using GPU and parallel processing techniques, supplies reliable results by analyzing a variety of large-scale grain boundary issues at a high speed.
Samadii DEM is used in the mechanical, electronic, chemical, semiconductor, manufacturing, automotive, energy, and construction/facilities industries.
Why deploy Samadii DEM on Azure?
- Modern and diverse compute options to meet your workload's needs
- The flexibility of virtualization without the need to buy and maintain physical hardware
- Rapid provisioning
- Good scale and cost efficiency on NCasT4_v3-series VMs
Architecture
Download a Visio file of this architecture.
Components
- Azure Virtual Machines is used to create a Windows VM. For information about deploying VMs and installing the drivers, see Windows VMs on Azure.
- Azure Virtual Network is
used to create a private network infrastructure in the cloud.
- Network security groups restrict access to the VM.
- A public IP address connects the internet to the VM.
- A physical solid-state drive (SSD) provides storage.
Compute sizing and drivers
Performance tests of Samadii DEM on Azure used NVv3, NCasT4_v3, NCv3, ND_A100_v4, and NC A100 v4 series VMs running Windows 10. The following table provides details about the VMs.
VM size | GPU name | vCPUs | Memory, in GiB | Maximum data disks | GPUs | GPU memory, in GiB | Maximum uncached disk throughput, in IOPS / MBps | Temporary storage (SSD), in GiB | Maximum NICs |
---|---|---|---|---|---|---|---|---|---|
Standard_NV12s_v3 | Tesla M60 | 12 | 112 | 12 | 1 | 8 | 20,000 / 200 | 320 | 4 |
Standard_NC4as_T4_v3 | Tesla T4 | 4 | 28 | 8 | 1 | 16 | - | 180 | 2 |
Standard_NC6s_v3 | V100 | 6 | 112 | 12 | 1 | 16 | 20,000 / 200 | 736 | 4 |
Standard_ND96asr_v4 | A100 | 96 | 900 | 32 | 8 | 40 | 80,000 / 800 | 6,000 | 8 |
Standard_NC24ads_A100_v4 | A100 | 24 | 220 | 32 | 1 | 80 | 30,000 / 1,000 | 1,123 | 2 |
Required drivers
To take advantage of the GPU capabilities of NVv3, NCas_T4_v3, NCv3, ND_A100_v4, and NC A100 v4 series VMs, you need to install NVIDIA GPU drivers.
Samadii DEM installation
Before you install Samadii DEM, you need to deploy and connect a VM and install the required NVIDIA drivers.
For information about deploying the VM and installing the drivers, see Run a Windows VM on Azure.
Important
NVIDIA Fabric Manager installation is required for VMs that use NVLink. ND_A100_v4 VMs use this technology.
Following are some prerequisites for running Samadii applications.
- Windows 10 (x64) OS
- One or more NVIDIA CUDA-enabled GPUs: Tesla, Quadro, or GeForce series
- Visual C++ 2010 SP1 Redistributable Package
- Microsoft MPI v7.1
- .NET Framework 4.5
The product installation process involves installing a license server, installing Samadii DEM, and configuring the license server. For more information about installing Samadii DEM, contact Metariver Technology.
Samadii DEM performance results
The following table shows the operating system versions and processors that were used for the tests.
VM series | ND_A100_v4 | NCv3 | NCasT4_v3 | NVv3 | NC A100 v4 |
---|---|---|---|---|---|
Operating system version | Windows 10 Professional, version 20H2 | Windows 10 Professional, version 20H2 | Windows 10 Professional, version 20H2 | Windows 10 Professional, version 20H2 | Windows 10 Professional, version 21H2 |
OS architecture | x86-64 | x86-64 | x86-64 | x86-64 | x86-64 |
Processor | AMD EPYC 7V12, 64-core processor, 2.44 GHz (2 processors) | Intel Xeon CPU E5-2690 v4 | AMD EPYC 7V12, 64-core processor, 2.44 GHz | Intel Xeon CPU E5-2690 v4 | AMD EPYC 7V13, 64-core processor, 2.44 GHz |
The following two models were used for testing.
Simple box
- Size: 13,560
- Cell type: Shell
- Solver: Samadii-dem-x64-v21 R2
Auger mixer
- Size: 42,446
- Cell type: Shell
- Solver: Samadii-dem-x64-v21 R2
Results for the simple box model
The following table shows the elapsed runtimes and relative speed increases for various VM configurations.
VM series | GPU | 1-GPU runtime, in seconds | Relative speed increase | 2-GPU runtime, in seconds | Relative speed increase |
---|---|---|---|---|---|
ND_A100_v4 | A100 | 1731 | 1.61 | 2521 | 1.34 |
NCv3 | V100 | 182 | 1.53 | 249 | 1.35 |
NCasT4_v3 | Tesla T4 | 176 | 1.58 | 2362 | 1.43 |
NVv3 | Tesla M60 | 278 | 1.00 | 337 | 1.00 |
NC A100 v4 | A100 | 290 | 0.96 | 355 | 0.95 |
1 In these cases, the number of GPUs was artificially limited. This VM has eight GPUs.
2 In these cases, the number of GPUs was artificially limited. This VM is available with one or four GPUs.
Here are the relative speed increases in graphical form:
Results for the auger mixer model
The following table shows the elapsed runtimes and relative speed increases for various VM configurations.
VM series | GPU | 1-GPU runtime, in seconds | Relative speed increase | 2-GPU runtime, in seconds | Relative speed increase |
---|---|---|---|---|---|
ND_A100_v4 | A100 | 1,7661 | 2.15 | 2,0541 | 1.78 |
NCv3 | V100 | 1,885 | 2.01 | 2,140 | 1.71 |
NCasT4_v3 | Tesla T4 | 2,601 | 1.46 | 2,5432 | 1.44 |
NVv3 | Tesla M60 | 3,794 | 1 | 3,659 | 1 |
NC A100 v4 | A100 | 1,701 | 2.23 | 1,939 | 1.89 |
1 In these cases, the number of GPUs was artificially limited. This VM has eight GPUs.
2 In these cases, the number of GPUs was artificially limited. This VM is available with one or four GPUs.
Here are the relative speed increases in graphical form:
Azure cost
The following tables present wall-clock times in hours. To compute the total cost, multiply these times by the Azure VM hourly costs for NVv3, NCasT4_v3, NCv3, ND_A100_v4, and NC A100 v4 series VMs. For the current hourly costs, see Windows Virtual Machines Pricing.
Only simulation runtime is considered for these cost calculations. Application installation time and license costs aren't included. In some cases, the number of GPUs was artificially limited for the sake of testing.
You can use the Azure pricing calculator to estimate the costs for your configuration.
Costs, simple box
VM size | Number of GPUs | GPUs utilized | Wall-clock time, in hours |
---|---|---|---|
Standard_ND96asr_v4 | 8 | 1 | 0.048 |
Standard_ND96asr_v4 | 8 | 2 | 0.07 |
Standard_NC6s_v3 | 1 | 1 | 0.05 |
Standard_NC12s_v3 | 2 | 2 | 0.07 |
Standard_NC4as_T4_v3 | 1 | 1 | 0.049 |
Standard_NC64as_T4_v3 | 4 | 2 | 0.066 |
Standard_NV12s_v3 | 1 | 1 | 0.077 |
Standard_NV24s_v3 | 2 | 2 | 0.094 |
Standard_NC24ads_A100_v4 | 2 | 1 | 0.081 |
Standard_NC24ads_A100_v4 | 2 | 2 | 0.099 |
Costs, auger mixer
VM size | Number of GPUs | GPUs utilized | Wall-clock time, in hours |
---|---|---|---|
Standard_ND96asr_v4 | 8 | 1 | 0.49 |
Standard_ND96asr_v4 | 8 | 2 | 0.57 |
Standard_NC6s_v3 | 1 | 1 | 0.53 |
Standard_NC12s_v3 | 2 | 2 | 0.59 |
Standard_NC4as_T4_v3 | 1 | 1 | 0.72 |
Standard_NC64as_T4_v3 | 4 | 2 | 0.71 |
Standard_NV12s_v3 | 1 | 1 | 1.05 |
Standard_NV24s_v3 | 2 | 2 | 1.02 |
Standard_NC24ads_A100_v4 | 2 | 1 | 0.47 |
Standard_NC24ads_A100_v4 | 2 | 2 | 0.54 |
Summary
- Samadii DEM was successfully tested on ND_A100_v4, NCv3, NCasT4_v3, NVv3, and NC A100 v4 series VMs.
- Performance tests demonstrate that Samadii DEM performs well on NCasT4_v3, NCv3, NC A100 v4, and ND_A100_v4 series VMs. The speed increases are as much as 1.5 times, 2 times, 2.25 times, and 2.15 times, respectively, higher than the times recorded with Tesla M60 GPU cards.
- We recommend NCasT4_v3 VMs with a one-GPU configuration because these VMs provide good scale-up and are also cost efficient. For the simple box model, they provide relative speed increases that are comparable to those recorded on the NC A100 v4 VM.
Contributors
This article is maintained by Microsoft. It was originally written by the following contributors.
Principal authors:
- Hari Bagudu | Senior Manager
- Gauhar Junnarkar | Principal Program Manager
- Vinod Pamulapati | HPC Performance Engineer
Other contributors:
- Mick Alberts | Technical Writer
- Guy Bursell | Director of Business Strategy
- Sachin Rastogi | Manager
To see non-public LinkedIn profiles, sign in to LinkedIn.
Next steps
- GPU-optimized virtual machine sizes
- Virtual machines on Azure
- Virtual networks and virtual machines on Azure
- Learning path: Run HPC applications on Azure