This architecture demonstrates running computational fluid dynamics simulations using Azure. Learn to create, manage, and optimize clusters using Azure CycleCloud.
Download a Visio file of this architecture.
This diagram shows a high-level overview of a typical hybrid design providing job monitoring of the on-demand nodes in Azure:
- Connect to the Azure CycleCloud server to configure the cluster.
- Configure and create the cluster head node, using RDMA enabled machines for MPI.
- Add and configure the on-premises head node.
- If there are insufficient resources, Azure CycleCloud scales the Azure compute resources up (or down). A predetermined limit can be defined to prevent over allocation.
- Tasks are allocated to the execute nodes.
- Data is cached in Azure from the on-premises NFS server.
- Data is read in from the Avere vFXT for Azure cache.
- Job and task information is relayed to the Azure CycleCloud server.
- Azure CycleCloud a tool for creating, managing, operating, and optimizing HPC and Big Compute clusters in Azure.
- Avere vFXT on Azure is used to provide an enterprise-scale clustered file system built for the cloud.
- Azure Virtual Machines (VMs) is used to create a static set of compute instances.
- Virtual machine scale sets provide a group of identical VMs capable of being scaled up or down by Azure CycleCloud.
- Azure Storage accounts are used for synchronization and data retention.
- Azure Virtual Networks enable many types of Azure resources, such as VMs, to securely communicate with each other, the internet, and on-premises networks.
Customers can also use Azure CycleCloud to create a grid entirely in Azure. In this setup, the Azure CycleCloud server is run within your Azure subscription.
For a modern application approach where management of a workload scheduler isn't needed, Azure Batch can help. Azure Batch can run large-scale parallel and high-performance computing (HPC) applications efficiently in the cloud. Azure Batch allows you to define the Azure compute resources to execute your applications in parallel or at scale without manually configuring or managing infrastructure. Azure Batch schedules compute-intensive tasks and dynamically adds and removes compute resources based on your requirements.
Computational fluid dynamics (CFD) simulations require significant compute time along with specialized hardware. As cluster usage increases, simulation times and overall grid use grow, leading to issues with spare capacity and long queue times. Adding physical hardware can be expensive, and might not align to the usage peaks and valleys that a business goes through. By taking advantage of Azure, many of these challenges can be overcome with no capital expenditure.
Azure provides the hardware you need to run your CFD jobs on both GPU and CPU virtual machines. RDMA (Remote Direct Memory Access) enabled VM sizes have FDR InfiniBand-based networking, which allows for low latency MPI (Message Passing Interface) communication. When you combine these solutions with the Avere vFXT, which provides an enterprise-scale clustered file system, customers can ensure maximum throughput for read operations in Azure.
To simplify the creation, management, and optimization of HPC clusters, Azure CycleCloud can be used to provision clusters and orchestrate data in both hybrid and cloud scenarios. When you monitor the pending jobs, CycleCloud will automatically launch on-demand compute, where you only pay for what you use, connected to the workload scheduler of your choice.
Potential use cases
Other relevant industries for CFD applications include:
- Aeronautics and aerospace/aircraft
- Building HVAC (facilities)
- Oil and gas (energy)
- Life sciences and healthcare
These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that can be used to improve the quality of a workload. For more information, see Microsoft Azure Well-Architected Framework.
Scalability and security
Scaling the execute nodes on Azure CycleCloud can be accomplished either manually or using autoscaling. For more information, see CycleCloud Autoscaling.
For general guidance on designing secure solutions, see the Azure security documentation.
Cost optimization is about looking at ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Overview of the cost optimization pillar.
The cost of running an HPC implementation using CycleCloud server will vary depending on a number of factors. For example, CycleCloud is charged by the amount of compute time that is used, with the Primary and CycleCloud server typically being constantly allocated and running. The cost of running the Execute nodes will depend on how long these are up and running as well as what size is used. The normal Azure charges for storage and networking also apply.
This scenario shows how CFD applications can be run in Azure, so the machines will require RDMA functionality, which is only available on specific VM sizes. The following are examples of costs that could be incurred for a scale set that is allocated continuously for eight hours per day for one month, with data egress of 1 TB. It also includes pricing for the Azure CycleCloud server and the Avere vFXT for Azure install:
- Region: North Europe
- Azure CycleCloud Server: 1 x Standard D3 (4 x CPUs, 14 GB Memory, Standard HDD 32 GB)
- Azure CycleCloud Primary Server: 1 x Standard D12 v (4 x CPUs, 28 GB Memory, Standard HDD 32 GB)
- Azure CycleCloud Node Array: 10 x Standard H16r (16 x CPUs, 112 GB Memory)
- Avere vFXT on Azure Cluster: 3 x D16s v3 (200 GB OS, Premium SSD 1-TB data disk)
- Data Egress: 1 TB
Review this price estimate for the hardware listed above.
Deploy this scenario
Follow these steps before deploying the Resource Manager template:
Create a service principal for retrieving the appId, displayName, name, password, and tenant.
Generate an SSH key pair to sign in securely to the CycleCloud server.
Click the link below to deploy the solution.
Log into the CycleCloud server to configure and create a new cluster.
The Avere Cache is an optional solution that can drastically increase read throughput for the application job data. Avere vFXT for Azure solves the problem of running these enterprise HPC applications in the cloud while leveraging data stored on-premises or in Azure Blob storage.
For organizations that are planning for a hybrid infrastructure with both on-premises storage and cloud computing, HPC applications can "burst" into Azure using data stored in NAS devices and spin up virtual CPUs as needed. The data set is never moved completely into the cloud. The requested bytes are temporarily cached using an Avere cluster during processing.
To set up and configure an Avere vFXT installation, follow the Avere Setup and Configuration guide.
This article is maintained by Microsoft. It was originally written by the following contributors.
- Mike Warrington | FastTrack for Azure Engineer
- What is Azure CycleCloud?
- Azure Virtual Machines (VMs)
- Introduction to Azure Storage
- What is Azure Virtual Network?
See the following virtual machine articles: