Deploy Devito on an Azure virtual machine

Azure Blob Storage

Azure CycleCloud

Azure Virtual Machines

Azure Virtual Network

Caution

This article references CentOS, a Linux distribution that is nearing End Of Life (EOL) status. Please consider your use and plan accordingly. For more information, see the CentOS End Of Life guidance.

This article describes how to run Devito on an Azure virtual machine (VM). It also presents the performance results of running Devito on Azure.

Devito is a functional language that you can implement as a Python package. With Devito, you can use high-level symbolic problem definitions to create optimized stencil computation, such as finite differences, image processing, and machine learning. Devito is built on SymPy and uses automated code generation and just-in-time compilation to run optimized computational kernels on several compute platforms, including CPUs, GPUs, and clusters.

Architecture

The following diagram shows a single-node architecture:

Download a Visio file of this architecture.

The following diagram shows a multi-node architecture:

Download a Visio file of this architecture.

Components

Azure CycleCloud is an enterprise-friendly tool that's used to orchestrate and manage HPC environments on Azure.
Azure Virtual Machines is used to create a Linux VM. For information about how to deploy a VM and install drivers, see Linux VMs on Azure.
Azure Virtual Network is used to create a private network infrastructure in the cloud.
Network security groups are used to restrict access to the VM.
A public IP address connects the internet to the VM.
An Azure Blob Storage physical solid-state drive (SSD) is used for storage.
The Azure CycleCloud REST API is used to add automated and programmatic cluster management capabilities, like determining a cluster's status or creating nodes.

Scenario details

Devito provides key offerings like:

Mechanisms to adjust finite difference discretization.
Constructs to express various operators.
A flexible API.
The ability to generate highly optimized parallel code.
Distributed NumPy arrays.
Smooth integration with popular Python packages.

Deploy Devito on Azure to get benefits like:

Modern and diverse compute options to meet your workload's needs.
The flexibility of virtualization without the need to buy and maintain physical hardware.
Rapid provisioning.

Devito provides a functional language to implement sophisticated operators that can be made up of multiple stencil computations, boundary conditions, sparse operations (for example, interpolation), and more. With Devito, you might use explicit finite difference methods to approximate partial differential equations. For example, you can implement a 2D diffusion operator by using the following equation:

An operator generates low-level code from an ordered collection of Eq. This example is for a single equation.

There's virtually no limit to the complexity of an operator. The Devito compiler automatically analyzes the input, detects and applies optimizations (including single-node and multi-node parallelism), and generates code with suitable loops and expressions.

Install Devito

Before you install Devito, you need to deploy and connect a Linux VM, and install the required AMD and InfiniBand drivers.

For information about deploying the VM and installing the drivers, see Run a Linux VM on Azure.

After you deploy the Linux VM, see the Devito installation instructions to learn about three methods for installing Devito on your VM:

Docker installation
Pip installation
Conda environment installation

Compute sizing and drivers

The Devito performance tests that are presented in the next sections used HBv3-series VMs running Linux. The following table provides details about these VMs:

VM size	Number of vCPUs (cores)	RAM memory (GiB)	Memory bandwidth (GBps)	Base CPU frequency (GHz)	All-cores frequency (GHz, peak)	Single-core frequency (GHz, peak)	RDMA performance (Gbps)	Maximum data disks
Standard_HB120rs_v3	120	448	350	1.9	3.0	3.5	200	32
Standard_HB120-96rs_v3	96	448	350	1.9	3.0	3.5	200	32
Standard_HB120-64rs_v3	64	448	350	1.9	3.0	3.5	200	32
Standard_HB120-32rs_v3	32	448	350	1.9	3.0	3.5	200	32
Standard_HB120-16rs_v3	16	448	350	1.9	3.0	3.5	200	32

Devito performance results

Benchmarking Devito on Azure

To test the performance of Devito on Azure, benchmarking was performed by using the HB120rs_v3 series SKU. There are many seismic models, like acoustic, tti, elastic, and visco-elastic, available on the tutorials page of the Devito website. The tests in this article use a forward operator under the acoustic model for benchmarking the performance of Devito.

The following table provides information about the operating system that was used for testing:

Operating system and hardware details (Azure infrastructure)
Operating system version	CentOS-based 8.1 HPC
OS architecture	x86-64
Processor	AMD EPYC 7V73X

Benchmarking a Devito operator

You can use the benchmark.py python file to test the performance of a Devito operator. The file is located in the /benchmarks/user folder that's in the Devito folder. The benchmark.py file implements a minimalist framework to evaluate the performance of a Devito operator, while varying:

The problem size, for example the shape of the computational grid.
The discretization, for example the space-order and time-order of the input/output fields.
The simulation time (in milliseconds).
The performance optimization level.
The autotuning level.

Devito performance on the HB120rs_v3 series (single-node)

The Devito forward operator performance for the acoustic model was tested on Standard HBv3 series virtual machines with 16, 32, 64, 96, and 120 vCPU configurations.

The following table shows the results for the CentOS-based 8.1 HPC image:

Number of vCPUs (cores)	Forward operator runtime (in seconds)	GFLOPS/second	Relative speed increase
16	184.39	211.24	N/A
32	126.20	308.55	1.46
64	117.61	331.22	1.57
96	132.86	293.25	1.39
120	149.99	259.78	1.23

Graph that shows the relative speed increase for a HBv3-series VM.

Note that for the single-node tests, the Devito operator is run on all HBv3-series VM configurations. The Standard_HB120-16rs_v3 VM runtime is used as the baseline to calculate the relative speed increase.

Devito performance on a cluster (multi-node)

The forward operator performance in the single-node tests show the scale-up behavior for the 64 and 96 vCPU configurations. The following performance tests run the Devito operator on two cluster configurations with 64 vCPUs and 96 vCPUs, respectively. The CentOS-based 8.1 HPC image is used for these two clusters.

The following table shows the results for a cluster with 64 vCPUs per node:

Number of nodes	Number of vCPUs (cores)	Forward operator runtime (in seconds)	GFLOPS/second	Relative speed increase
1	64	121.73	320.04	N/A
2	128	75.68	514.86	1.61
4	256	60.77	641.30	2.00
8	512	51.94	750.40	2.34

Graph that shows the relative speed increase for a 64-vCPU node.

The following table shows the results for a cluster with 96 vCPUs per node:

VM configuration	Number of nodes	Number of vCPUs (cores)	Forward operator runtime (in seconds)	GFLOPS/second	Relative speed increase
Standard_HB120-96rs_v3	1	96	137.19	284	N/A
Standard_HB120-96rs_v3	2	192	88.72	439.27	1.55
Standard_HB120-96rs_v3	4	384	75.11	518.93	1.83
Standard_HB120-96rs_v3	8	768	69.38	561	1.98

Graph that shows the relative speed increase for a 96-vCPU node.

Azure cost

The following table presents the wall-clock times for running the simulations. You can multiply these times by the Azure VM hourly costs for HB120rs_v3-series VMs to calculate costs. For the current hourly costs, see Linux virtual machines pricing.

The following runtimes represent only the simulation time. Application installation time isn't considered.

You can use the Azure pricing calculator to estimate the costs for your configuration.

The following table shows runtimes for the HB120rs_v3 series:

Number of CPUs per node	Forward operator runtime (in hours)
Single node	0.197
64	0.086
96	0.102

Summary

Devito was successfully deployed and tested on the HB120rs_v3 series VM on Azure.
For the single-node configuration, the Devito scales well up to 64 and 96 cores. It has a maximum scale up of 1.57 times with 64 cores.
For the multi-node configuration, there's a gradual scale up from one node to eight nodes in both the clusters with the Standard_HB120-64rs_v3 and the Standard_HB120-96rs_v3 virtual machines.

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal authors:

Hari Bagudu | Senior Manager
Gauhar Junnarkar | Principal Program Manager
Vinod Pamulapati | HPC Performance Engineer

Other contributors:

Guy Bursell | Director Business Strategy
Sachin Rastogi | Manager

To see non-public LinkedIn profiles, sign in to LinkedIn.

Deploy Devito on an Azure virtual machine