Редактиране

Споделяне чрез


GPU virtual machines for Azure Stack Edge Pro GPU devices

APPLIES TO: Yes for Pro - GPU SKUAzure Stack Edge Pro - GPUYes for Pro 2 SKUAzure Stack Edge Pro 2Yes for Pro R SKUAzure Stack Edge Pro R  

GPU-accelerated workloads on an Azure Stack Edge Pro GPU device require a GPU VM (virtual machine). This article provides an overview of GPU VMs, including supported OSs, GPU drivers, and VM sizes. Deployment options for GPU VMs used with Kubernetes clusters also are discussed.

About GPU VMs

Your Azure Stack Edge devices may be equipped with 1 or 2 of Nvidia's Tesla T4 or Tensor Core A2 GPU. To deploy GPU-accelerated VM workloads on these devices, use GPU-optimized VM sizes. The GPU VM chosen should match with the make of the GPU on your Azure Stack Edge device. For more information, see Supported N series GPU optimized VMs.

To take advantage of the GPU capabilities of Azure N-series VMs, Nvidia GPU drivers must be installed. The Nvidia GPU driver extension installs appropriate Nvidia CUDA or GRID drivers. You can install the GPU extensions using templates or via the Azure portal.

You can install and manage the extension using the Azure Resource Manager templates after VM deployment. In the Azure portal, you can install the GPU extension during or after you deploy a VM; for instructions, see Deploy GPU VMs on your Azure Stack Edge device.

If your device has a Kubernetes cluster configured, be sure to review deployment considerations for Kubernetes clusters before you deploy GPU VMs.

Supported OS and GPU drivers

The Nvidia GPU driver extensions for Windows and Linux support the following OS versions.

Supported OS for GPU extension for Windows

This extension supports the following operating systems (OSs). Other versions may work but haven't been tested in-house on GPU VMs running on Azure Stack Edge devices.

Distribution Version
Windows Server 2019 Core
Windows Server 2016 Core

Supported OS for GPU extension for Linux

This extension supports the following OS distro, depending on the driver support for specific OS version. Other versions may work but haven't been tested in-house on GPU VMs running on Azure Stack Edge devices.

Distribution Version
Red Hat Enterprise Linux 7.4

Note

Ubuntu 18.04 LTS GPU extension has been deprecated. The GPU extension is no longer supported on Ubuntu 18.04 GPU VMs running on Azure Stack Edge devices. If you plan to utilize the Ubuntu version 18.04 LTS distro, see steps for manual GPU driver installation at CUDA Toolkit 12.1 Update 1 Downloads. You may need to download the CUDA signing key before the installation. For an example of installing the signing key, see Troubleshoot GPU extension issues for GPU VMs on Azure Stack Edge Pro GPU.

GPU VM deployment

You can deploy a GPU VM via the Azure portal or using Azure Resource Manager templates. The GPU extension is installed after VM creation.

GPU VMs and Kubernetes

Before you deploy GPU VMs on your device, review the following considerations if Kubernetes is configured on the device.

For 1-GPU device:

  • Create a GPU VM followed by Kubernetes configuration on your device: In this scenario, the GPU VM creation and Kubernetes configuration will both be successful. Kubernetes won't have access to the GPU in this case.

  • Configure Kubernetes on your device followed by creation of a GPU VM: In this scenario, the Kubernetes claims the GPU on your device and the VM creation will fail as there are no GPU resources available.

For 2-GPU device

  • Create a GPU VM followed by Kubernetes configuration on your device: In this scenario, the GPU VM that you create will claim one GPU on your device and Kubernetes configuration will also be successful and claim the remaining one GPU.

  • Create two GPU VMs followed by Kubernetes configuration on your device: In this scenario, the two GPU VMs claim the two GPUs on the device and the Kubernetes is configured successfully with no GPUs.

  • Configure Kubernetes on your device followed by creation of a GPU VM: In this scenario, the Kubernetes claims both the GPUs on your device and the VM creation will fail as no GPU resources are available.

Next steps