Autoscale Azure Batch pools when running MPI applications

By Paul Edwards, Senior Program Manager, AzureCAT

Our team works with a growing number of industry partners to enable Big Compute scenarios on Azure (think hyperscale meets high-performance computing). Azure provides true HPC in the cloud for customers who run simulations of large, complex engineering models. But sometimes we have to come up with creative workarounds when all the pieces won’t fit together.

That was the problem we saw recently with two companies who used Azure Batch to automatically scale their Azure compute resources. With Azure Batch, you can execute applications in parallel, and at scale. There's no need to manually create, configure, and manage an HPC cluster, individual virtual machines, virtual networks, a complex job, or task scheduling infrastructure. Batch automates these tasks to create cloud-native applications. Plus, the service is free—you only pay for the resources used as part of the Batch workflow.

However, two customers running MPI jobs encountered a limit in Azure Batch when using the autoscale feature to dynamically add and remove servers as the task demands change. Azure Batch exposes a number of parameters to support formulas you use to automate scaling, but it does not expose the multi-instance node count. This prevents you from retrieving the number of nodes requested by all queued tasks for your MPI jobs. One workaround is to create a dedicated pool per job size, but this approach was not practical for our Big Compute customers given the scale of their jobs.

To work around the issue, I offered a straightforward python script that uses the Azure Batch API to implement autoscaling for pools running MPI tasks. This solution is hosted using Docker and an Azure Container Instances, but other setups can be used such as Azure Function. The script queries active jobs in each pool, lists the tasks for each job, and checks for dependencies. It calculates the number of nodes required (using the number of instances from each task ready to run), and finally it resizes the pool.

 usage: [-h] [-p POOLS] [-m MAX_NODES] [-l LOOP]
                      [-n ACCOUNT_NAME] [-u ACCOUNT_URL] [-k ACCOUNT_KEY]

optional arguments:
  -h, --help            show this help message and exit
  -p POOLS, --pools POOLS
                        comma separated list of pools (all pools if empty)
  -m MAX_NODES, --max-nodes MAX_NODES
                        maximum number of nodes for a pool
  -l LOOP, --loop LOOP  if non-zero continuously repeating the auto scale
                        sleeping for this number of seconds
  -n ACCOUNT_NAME, --account-name ACCOUNT_NAME
                        the Batch account name
  -u ACCOUNT_URL, --account-url ACCOUNT_URL
                        the Batch account URL
  -k ACCOUNT_KEY, --account-key ACCOUNT_KEY
                        the Batch account key

The Azure Batch product team is evaluating a more permanent solution, but in the meantime, please feel free to use our workaround for your MPI applications. The GitHub repository includes the python script and instructions for deploying it with Docker and running on an Azure Container Instance.