Graphics processing unit (GPU) virtual machine (VM) on Azure Stack Hub
This article describes which graphics processing unit (GPU) models are supported on an Azure Stack Hub integrated system. The article also contains instructions on installing the drivers used with the GPUs. GPU support in Azure Stack Hub enables solutions such as artificial intelligence, training, inference, and data visualization. The AMD Radeon Instinct MI25 can be used to support graphic-intensive applications such as Autodesk AutoCAD.
You can choose from three GPU models. They are available in NVIDIA V100, NVIDIA T4, and AMD MI25 GPUs. These physical GPUs align with the following Azure N-Series virtual machine (VM) types as follows:
Warning
GPU VMs are not supported in this release. You must upgrade to Azure Stack Hub 2005 or later. In addition, your Azure Stack Hub hardware must have physical GPUs.
NCv3
NCv3-series VMs are powered by NVIDIA Tesla V100 GPUs. Customers can take advantage of these updated GPUs for traditional HPC workloads such as reservoir modeling, DNA sequencing, protein analysis, Monte Carlo simulations, and others.
Size | vCPU | Memory: GiB | Temp storage (SSD) GiB | GPU | GPU memory: GiB | Max data disks | Max NICs |
---|---|---|---|---|---|---|---|
Standard_NC6s_v3 | 6 | 112 | 736 | 1 | 16 | 12 | 4 |
Standard_NC12s_v3 | 12 | 224 | 1474 | 2 | 32 | 24 | 8 |
Standard_NC24s_v3 | 24 | 448 | 2948 | 4 | 64 | 32 | 8 |
NVv4
The NVv4-series virtual machines are powered by AMD Radeon Instinct MI25 GPUs. With the NVv4-series, Azure Stack Hub introduces virtual machines with partial GPUs. This size can be used for GPU accelerated graphics applications and virtual desktops. NVv4 virtual machines currently support only the Windows guest operating system.
Size | vCPU | Memory: GiB | Temp storage (SSD) GiB | GPU | GPU memory: GiB | Max data disks | Max NICs |
---|---|---|---|---|---|---|---|
Standard_NV4as_v4 | 4 | 14 | 88 | 1/8 | 2 | 4 | 2 |
Standard_NV8as_v4 | 8 | 28 | 176 | 1/4 | 4 | 8 | 4 |
Standard_NV16as_v4 | 16 | 56 | 352 | 1/2 | 8 | 16 | 8 |
Standard_NV32as_v4 | 32 | 112 | 704 | 1 | 16 | 32 | 8 |
NCasT4_v3
Size | vCPU | Memory: GiB | GPU | GPU memory: GiB | Max data disks | Max NICs |
---|---|---|---|---|---|---|
Standard_NC4as_T4_v3 | 4 | 28 | 1 | 16 | 8 | 4 |
Standard_NC8as_T4_v3 | 8 | 56 | 1 | 16 | 16 | 8 |
Standard_NC16as_T4_v3 | 16 | 110 | 1 | 16 | 32 | 8 |
Standard_NC64as_T4_v3 | 64 | 440 | 4 | 64 | 32 | 8 |
NC_A100 v4
The NC_A100 series VMs are powered by NVIDIA Ampere A100 GPUs, the successor of the Tesla V100 GPUs. You can take advantage of these updated GPUs for traditional HPC workloads such as reservoir modeling, DNA sequencing, protein analysis, Monte Carlo simulations, and others.
Size | vCPU | Memory: GiB | Temp storage (GiB) | Max data disks | GPU | GPU memory GiB | Max NICs |
---|---|---|---|---|---|---|---|
Standard_NC24ads_A100_v4 | 24 | 220 | 1123 | 12 | 1 | 80 | 2 |
Standard_NC48ads_A100_v4 | 48 | 440 | 2246 | 24 | 2 | 160 | 4 |
NC_L40S v4
Size | vCPU | Memory: GiB | Temp storage (GiB) | Max data disks | GPU | GPU memory GiB | Max NICs |
---|---|---|---|---|---|---|---|
Standard_NC24ads_L40S_v4 | 24 | 220 | 1123 | 8 | 1 | 80 | 2 |
Standard_NC48ads_L40S_v4 | 48 | 440 | 2246 | 16 | 2 | 160 | 4 |
GPU system considerations
- GPU must be one of these SKUs: AMD MI-25, Nvidia V100 (and variants), Nvidia T4.
- Number of GPUs per server supported (1, 2, 3, 4). Preferred are: 1, 2, and 4.
- All GPUs must be of the exact same SKU throughout the scale unit.
- All GPU quantities per server must be the same throughout the scale unit.
- GPU partition size (for AMD Mi25) needs to be the same throughout all GPU VMs on the scale unit.
Capacity planning
The Azure Stack Hub capacity planner was updated to support GPU configurations. It's accessible here.
Adding GPUs on an existing Azure Stack Hub
Azure Stack Hub now supports adding GPUs to any existing system. To add a GPU, run stop-azurestack
, run through the procedure of stop-azurestack
, add GPUs, and then run start-azurestack
until completion. If the system already had GPUs, then any previously created GPU VMs must be stop-deallocated
and then restarted.
Patch and update, FRU behavior of VMs
GPU VMs undergo downtime during operations such as patch and update (PnU) and hardware replacement (FRU) of Azure Stack Hub. The following table covers the state of the VM as observed during these activities and the manual action you can do to make these VMs available after the operation.
Operation | PnU - Full Update, OEM update | FRU |
---|---|---|
VM state | Unavailable during update. Can be made available with manual operation. VM is automatically online post update. | Unavailable during FRU. Can be made available with manual operation. VM needs to be brought back up after FRU |
Manual operation | If the VM needs to be made available during the update, if there are available GPU partitions, the VM can be restarted from the portal by clicking the Restart button. VM automatically comes back up post update. | VM is not available during FRU. If there are available GPUs, VM may be stop-deallocated and restarted during FRU. Post FRU completion, the VM must be stop-deallocated using the Stop button, then restarted using the Start button. |
Guest driver installation
The following PowerShell cmdlets can be used for driver installation:
$VmName = <VM Name In Portal>
$ResourceGroupName = <Resource Group of VM>
$Location = "redmond"
$driverName = <Give a name to the driver>
$driverPublisher = "Microsoft.HpcCompute"
$driverType = <Specify Driver Type> #GPU Driver Types: "NvidiaGpuDriverWindows"; "NvidiaGpuDriverLinux"; "AmdGpuDriverWindows"
$driverVersion = <Specify Driver Version> #Nvidia Driver Version:"1.3"; AMD Driver Version:"1.0"
Set-AzureRmVMExtension -Location $Location `
-Publisher $driverPublisher `
-ExtensionType $driverType `
-TypeHandlerVersion $driverVersion `
-VMName $VmName `
-ResourceGroupName $ResourceGroupName `
-Name $driverName `
-Settings $Settings ` # If no settings are set, omit this parameter
-Verbose
Depending on the OS, type and connectivity of your Azure Stack Hub GPU VM, you must replace these values with the settings below.
AMD MI25
The guest driver version must match the Azure Stack Hub version, regardless of the connectivity state. Using newer versions not aligned with the Azure Stack Hub version can cause usability issues.
Azure Stack Hub Version | AMD Guest driver |
---|---|
2206 and later | 21.Q2-1, 20.Q4-1 |
2108 | 21.Q2-1, 20.Q4-1 |
2102 | 21.Q2-1, 20.Q4-1 |
Connected
Use the PowerShell script in the previous section with the appropriate driver type for AMD. The article Install AMD GPU drivers on N-series VMs running Windows provides instructions on installing the driver for the AMD Radeon Instinct MI25 inside the NVv4 GPU-P enabled VM, along with steps on how to verify driver installation.
Disconnected
Since the extension pulls the driver from a location on the internet, a VM that is disconnected from the external network can't access it. You can download the driver from the previous table and upload to a storage account in your local network that's accessible to the VM.
Add the AMD driver to a storage account and specify the URL to that account in Settings
. These settings must be used in the Set-AzureRMVMExtension cmdlet. For example:
$Settings = @{
"DriverURL" = <URL to driver in storage account>
}
NVIDIA
NVIDIA drivers must be installed inside the virtual machine for CUDA or GRID workloads using the GPU.
Use case: graphics/visualization GRID
This scenario requires the use of GRID drivers. GRID drivers can be downloaded through the NVIDIA Application Hub provided you have the required licenses. The GRID drivers also require a GRID license server with appropriate GRID licenses before using the GRID drivers on the VM.
$Settings = @{
"DriverURL" = "https://download.microsoft.com/download/e/8/2/e8257939-a439-4da8-a927-b64b63743db1/431.79_grid_win10_server2016_server2019_64bit_international.exe"; "DriverCertificateUrl" = "https://go.microsoft.com/fwlink/?linkid=871664";
"DriverType"="GRID"
}
Use case: compute/CUDA - Connected
CUDA drivers don't need a license server and don't need modified settings.
Use case: compute/CUDA - Disconnected
Links to NVIDIA CUDA drivers can be obtained using the link: https://raw.githubusercontent.com/Azure/azhpc-extensions/master/NvidiaGPU/resources.json
Windows:
$Settings = @{
"DriverURL" = "";
"DriverCertificateUrl" = "https://go.microsoft.com/fwlink/?linkid=871664";
"DriverType"="CUDA"
}
Linux:
You must reference some URLs for your settings:
URL | Notes |
---|---|
PUBKEY_URL | The PUBKEY_URL is the public key for the Nvidia driver repository not for the Linux VM. It's used to install driver for Ubuntu. |
DRIVER_URL | DRIVER_URL is the URL to download the Nvidia driver's repository information and is added to the Linux VM's list of repos. |
Add the URLs to your settings.
$Settings=@{
"isCustomInstall"=$true;
"DRIVER_URL"="https://go.microsoft.com/fwlink/?linkid=874273";
"CUDA_ver"="10.0.130";
"PUBKEY_URL"="http://download.microsoft.com/download/F/F/A/FFAC979D-AD9C-4684-A6CE-C92BB9372A3B/7fa2af80.pub";
"DKMS_URL"="https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm";
"LIS_URL"="https://aka.ms/lis";
"LIS_RHEL_ver"="3.10.0-1062.9.1.el7"
}