Dela via


Burst to Azure Batch with Microsoft HPC Pack

This topic contains information about extending your HPC Pack cluster to include Azure Batch pools as compute resources. Using these Azure Batch pools, you can increase ("burst") the capacity of your HPC cluster on-demand. See the Azure.com documentation for more about the Azure Batch service.

In this topic:

Prerequisites

  • HPC Pack cluster - You must create and configure at least the head node of a cluster. The burst to Batch feature is available starting in HPC Pack 2012 R2 Update 3.

  • Azure subscription - If you don't already have a subscription, sign up for a free trial, use MSDN subscriber benefits, or explore other purchase options.

Step 1: Create Azure Batch account

  • Azure Batch account- See Create and manage an Azure Batch account to create a Batch account in the Azure portal.. You'll need the following account information (available in the portal) to burst to Batch from HPC Pack 2012 R2 Update 3.

    • Batch account name

    • Batch account URL

    • Batch account key

From HPC Pack 2016 Update 1 on, you may need the following account information to burst to Batch from HPC Pack according to the different Azure Batch resource allocate modes (Batch Service or User Subscription) and client authentication methods (Batch Access Key or Azure AD):

  1. Batch Serivce with Access Key
    • Batch account name
    • Batch account URL
    • Batch account key
  2. Batch Service or User Subscription with Azure AD
    • Batch AAD Instance
    • Batch AAD Tenant Id
    • Batch AAD ClientApp Id
    • Batch AAD ClientApp Key
    • Batch account name
    • Batch account URL

Check the following table to decide which Batch account type and authentication method to choose. You may also check this blog and this doc to understand more about User Subscription pool allocation mode and how to use Azure AD authentication for Azure Batch service.

Account Type/Pool Allocation Mode Authentication Methods VM image types Low Priority VM VNet
Batch Service Access Key/ Azure AD PaaS/IaaS (MarketPlaceImage) / IaaS (CustomImage via Azure AD) Yes Yes (via Azure AD)
User Subscription Azure AD IaaS (MarketPlaceImage)/ IaaS (CustomImage) No Yes

Use the Azure CLI to create a Batch account with Batch Service Pool allocation mode and use Batch Access Key for authentication as below:

# Authenticate CLI session.
az login

# Select the subscription
az account set -s mysubscription

# Create a resource group.
az group create --name myresourcegroup --location mylocation

# Let's add a storage account reference to the Batch account for use as 'auto-storage'
# for applications. We'll start by creating the storage account.
az storage account create -g myresourcegroup -n mystorageaccount -l mylocation --sku Standard_LRS

# Create a Batch account.
az batch account create -g myresourcegroup -n mybatchaccount -l mylocation --storage-account mystorageaccount

# Now we can display the details of our created account.
az batch account show -g myresourcegroup -n mybatchaccount

# We can view the access keys to the Batch Account for future client authentication.
az batch account keys list -g myresourcegroup -n mybatchaccount

Use Azure CLI to create a Batch account with User Subscription pool allocation mode

# Authenticate CLI session.
az login

# Select the subscription
az account set -s mysubscription

# Allow Azure Batch to access the subscription (one-time operation).
az role assignment create --assignee MicrosoftAzureBatch --role contributor

# Create a resource group.
az group create --name myresourcegroup --location mylocation

# A Batch account that will allocate pools in the user's subscription must be configured
# with a Key Vault located in the same region. Let's create this first.
az keyvault create --resource-group myresourcegroup --name mykeyvault --location mylocation --enabled-for-deployment true     --enabled-for-disk-encryption true --enabled-for-template-deployment true

# We will add an access-policy to the Key Vault to allow access by the Batch Service.
az keyvault set-policy --resource-group myresourcegroup --name mykeyvault --spn ddbf3205-c6bd-46ae-8127-60eb93363864 --key-permissions all --secret-permissions all

# Now we can create the Batch account, referencing the Key Vault either by name (if they exist in the same resource group) or by its full resource ID.
az batch account create --resource-group myresourcegroup --name mybatchaccount –location mylocation --keyvault mykeyvault

To configure Azure AD for Batch authentication and obtain Batch AAD info

  1. Obtain Batch AAD Instance. If using global Azure Cloud, the AAD instance is https://login.microsoftonline.com/.
  2. Obtain Batch AAD Tenant Id. In the Azure portal, click More Services, search and choose Azure Active Directory, select your Active Directory by selecting your account in the top right corner of the page and click Properties. Copy the GUID value provided for the Directory ID. This value is also called the tenant ID.
  3. Register Batch client application and obtain the Batch AAD ClientApp Id.
  • In Azure portal, choose your Azure AD tenant by selecting your account in the top right corner of the page. - Choose More services, search and choose App Registrations. - Click on +New application registration. - Fill in the Name, choose Web app / API as Application type, and fill in a value specific to your application e.g. https://myAppName for the Sign-on URI. Click on Create. - After the application is successfully created, select the application in the list of App Registrations, and click on Properties. Copy the GUID value provided for the Application ID, it will be used as Batch AAD ClientApp Id.
  1. Configure to use a service principal for authentication and obtain the Batch AAD ClientApp Key.
  • Request a secret key for the application. Select the application created in the list of App Registrations, and click Keys, type in Key description and choose Duration, then click Save. Copy the value displayed which will be used as Batch AAD ClientApp Key. - Assign an RBAC role to the application to authenticate with a service principal. Choose More services, search and choose Batch accounts, click on the batch account created and select Access Control (IAM). Click on +ADD, select Contributor Role and the application registered. Click Save.

Note

Note that Azure AD authentication is supported by both Batch Service mode and User Subscription mode of Batch accounts.

Important: Once configured Azure AD for Batch authentication with following Batch AAD info obtained, open HPC Cluster Manager, under Configuration --> Deployment To-do List, click on Set Azure Batch Configuration to fill in the info accordingly in the form, and then click OK. This step is required when using Azure AD for Batch authentication in HPC Pack.

  • Batch AAD Instance
  • Batch AAD Tenant Id
  • Batch AAD ClientApp Id
  • Batch AAD ClientApp Key

Alternatively, you may use HPC Powershell cmdlet to set Batch AAD info as shown below:

# HPC PowerShell
Set-HpcClusterRegistry -BatchAADInstance '<AAD Instance e.g. https://login.microsoftonline.com/>' -BatchAADTenantId <TenantGUID> -BatchAADClientAppId <AppGUID> -BatchAADClientAppKey '<AppKey>'

Step 2: Create an Azure Batch pool template

To create an Azure Batch pool template, use the Create Node Template Wizard in HPC Cluster Manager.

To create a Batch pool template

  1. Start HPC Cluster Manager.

  2. In Configuration, in the Navigation Pane, click Node Templates.

  3. In the Actions pane, click New.
    The Create Node Template Wizard appears.

  4. On the Choose Node Template Type page, click Azure Batch pool template, and then click Next.

  5. On the Specify Template Name page, type a name for the node template, and optionally type a description for it. Click Next.

  6. On the Provide the Azure Batch account information page, fill in the Batch account name. If the Batch account type is Batch Service, choose BatchService as Batch account type. If using Batch Access Key for authentication, fill in the key as Account Key, if using Azure AD authentication, just leave it blank. If the Batch account type is User Subscription, choose UserSubscription as Batch account type. Fill in the Batch account URL and the Azure Storage Connection String we obtained previously, and then click Next.

Note

when AAD authentication is required, make sure Batch AAD Instance, Batch AAD Tenant Id, Batch AAD ClientApp Id and Batch AAD ClientApp Key are already set in the Deployment To-do List, or the account validation would fail with error "Invalid Azure Batch account. Please check Azure Batch account settings." when clicking Next.

  1. On the Azure Batch Autoscale configuration page, leave Enable Auto Scale unchecked, and click Next.

  2. On the Configure Remote Desktop Credentials and SSH page, optionally provide the credentials of a user that will be created on Azure Batch pool compute nodes during deployment. You can use the credentials later to connect to the pool compute nodes. For Linux nodes, you may also specify SSH Public Key and SSH Private Key File(.ppk) to SSH to the node via putty.exe. Refer here for how to generate a public key and a private key file for PuTTY. Note if you specify both password and SSH keys, SSH keys will be used for the connection. You also need to copy the generated private key file (.ppk) to %CCP_HOME%Bin folder to open SSH to the nodes from HPC Cluster Manager. Click on Next.

  3. On the Specify Startup Script page, with HPC Pack 2012 R2 Update 3, optionally specify an Azure Batch start task.

    • Command Line - the command executed when Azure Batch compute nodes start
    • Blob Source URL - Azure storage location of files you previously uploaded and which will automatically download to Azure Batch compute nodes
    • Local File Path - the location to download the files on the Azure Batch compute nodes
      In HPC Pack 2016 Update 1 or later version, optionally specify a command line or the name of a startup script to run on all Azure Batch compute nodes in Batch pool. Currently the startup script is only supported for Linux nodes. For example, if you want to run a script named startup.sh on all Linux nodes in a Batch pool when they start, you need to use the command line tool HpcPack.exe to zip and upload the script to the Azure storage account like below, and then specify startup.sh in the Command Line:
    HpcPack.exe create startup.sh.zip startup.sh 
    HpcPack.exe upload startup.sh.zip /account:<StorageAccountName> /key:<StorageAccountKey>
    
  4. Click Next and review all the template settings specified. Click Create to generate the Azure Batch pool node template.

Step 3: Add an Azure Batch pool

Use the Add Node Wizard in HPC Cluster Manager to add the Batch pool compute nodes.

To add an Azure Batch pool

  1. In HPC Cluster Manager, in Resource Management, in the Actions pane, click Add Node. The Add Node Wizard appears.

  2. On the Select Deployment Method page, click Add Azure Batch pool, and then click Next.

  3. On the Specify Azure Batch Pool Information page, select an Azure Batch pool template. According to the Batch account type in the template selected, specify the pool information as following:
    If Batch account type is Batch Service,

    • Number of Compute Nodes - the number of compute nodes (virtual machine instances) in the new Azure Batch pool
    • Whether the virtual machines are Dedicated VM or Low Priority VM
    • Choose the image type for either PaaS or IaaSMarketPlace. For PaaS, choose the OS Family. E.g. Windows Server 2016. For IaaSMarketPlace, choose Publisher, Offer, and Sku. E.g. Canonical, UbuntuServer, 16.04-LTS.
    • Size of Compute Nodes - the role size of each compute node.
    • Max Tasks Per Compute Node - the maximum number of concurrent tasks to run on each compute node. The default number is equal to the number of cores in the selected role size. The maximum number is three times larger than the actual number of cores. Note the Max Tasks Per Compute Node multiplied by the Number of Compute Nodes equals the total cores of the Batch pool node.
    • App Packages – optionally specify the application packages that already added in the Batch account in the format of <Id>:<Version>,<Id>:<Version>, … If Batch account type is User Subscription,
    • Number of Compute Nodes – (like above)
    • Choose the image type for either IaaSMarketPlace or IaaSCustomImage. For IaaSMarketPlace, choose Publisher, Offer, and Sku. E.g. Canonical, UbuntuServer, 16.04-LTS. For IaaSCustomImage, besides Publisher, Offer, and Sku, the Custom Image Resource Id is also required in the format of /subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.Compute/images/{imageName}
    • Size of Compute Nodes – (like above)
    • Max Tasks Per Compute Node – (like above)
    • App Packages – (like above)
    • VNet – optionally specify the Subnet Id for the Batch pool. Refer this doc for how to create a custom VNet and Subnet and obtain the Subnet Id. Also check the Batch requirements for a custom VNet specified in this doc.

    Note

    Refer this doc for how to capture image from a Linux VM and obtain the Image Resource Id. For a custom VM image or a custom VNet (see below), it is required to explicitly assign the Batch client application Contributor role to the resource via its Access control (IAM). Or 'BadRequest' failure could happen when starting the Batch pool.

Step 4: Start the pool

You have to start the pool before running jobs on it.

To start an Azure Batch pool

  1. In Resource Management, in the Navigation Pane, click Nodes or Azure Batch Pools.

  2. In the List or Heat Map view, select one or more Azure Batch pools.

  3. In the Actions pane, click Start.

    The Start Azure Batch Pools dialog box appears. Click Start.

  4. The state of the nodes changes from Not-Deployed to Provisioning.

    If you want to track the provisioning progress, select the pool, and then in the Details Pane, click the Provisioning Log tab. The Azure Batch pool should be created in less than 1 minute and the state changes to Offline.

Additional considerations

  • Monitor the status of Azure Batch compute nodes- After the Azure Batch pool is ready, the Azure Batch compute nodes are still being created and starting. To monitor the node status, select the pool and then in the Details Pane, click Azure Batch Compute Nodes.

  • Remote Desktop or SSH to compute nodes - After the compute nodes in Azure Batch pool are started (node state is Idle), you can connect by Remote Desktop or SSH to each compute node if you configured template settings to do so, for example, to perform some manual configuration or troubleshooting. To do this, select one or more Azure Batch pools, and then in the Actions pane, click Remote Desktop/SSH. When connecting to Linux nodes via SSH, it is required to copy the generated private key file (.ppk) to the %CCP_HOME%Bin folder on the client machine.

  • View startup tasks - If you specified a startup task in the Azure Batch pool template, after the Azure Batch pool is started, you can view the detailed output of the startup task by running the following HPC PowerShell cmdlet:

    Get-HpcBatchPoolStartTask -Name <PoolName>   
    
  • Heat map view - While the Azure Batch pool is running, view the heat map of the pool. In Resource Management on navigation pane, click Nodes, then choose Heat Map view. You may also check the per VM heat map for Linux nodes in the pool. In Resource Management on navigation pane, click Azure Batch Pools, then choose Heat Map view. Currently the following performance counters are collected:

    • CPU Usage
    • Disk Throughput
    • Free Disk Space
    • Network Usage
    • Available Physical Memory

.

Step 5: Run a job on the pool

Currently, HPC Pack supports running both normal batch/parametric/mpi job and clusrun commands on Azure Batch pools. Please note the following recommendations for using clusrun with Batch:

  • Run clusrun jobs on an Azure Batch pool when no other jobs running on the pool. If there are other jobs or tasks running, the clusrun job may need to wait for the running tasks to finish.

  • Jobs running on an Azure Batch pool by default don't return the task output to HPC Pack, because of the potential performance impact. You can define node release tasks in the job to retrieve the task output if you want. To retrieve the task output, change the following cluster property through HPC PowerShell:

    Set-HpcClusterProperty -GetAzureBatchTaskOutput $true  
    

Step 6: Stop the pool

When you are not using the Batch pool, stop the Azure resources. This deprovisions the pool compute nodes, reducing the costs of using a Batch pool.

To stop the pool

  1. In Resource Management, in the Navigation Pane, click Nodes or Azure Batch Pools.

  2. In the List or Heat Map view, select one or more Azure Batch pools that you want to stop.

  3. In the Actions pane, click Stop.

    The Stop Azure Batch Pools dialog box appears. Click Stop.

  4. If you want to track the stopping progress, select a node, and then in the Details Pane, click the Provisioning Log tab.

See Also

Microsoft HPC Pack: Node Deployment