Which Quota Do I Need to Request for NVIDIA AI Enterprise 1 GPU - x64 Gen2

Andrew Haselhan 0 Reputation points
2024-09-05T23:13:09.1166667+00:00

I am trying to run a virtual machine so that I can use NVIDIA TAO. I am using the image NVIDIA AI Enterprise 1 GPU - x64 Gen2. However, when I try to connect to the VM I get the following output:

Requesting a Cloud Shell.Succeeded. 
Connecting terminal...

Subscription used to launch your CloudShell a05dc9e8-4da7-4b35-8f79-e13c73b711e5 is not registered to Microsoft.CloudShell Namespace. Please follow these instructions "https://aka.ms/RegisterCloudShell" to register. In future, unregistered subscriptions will have restricted access to CloudShell service.

Your Cloud Shell session will be ephemeral so no files or system changes will persist beyond your current session.
andrew [ ~ ]$ az ssh vm --resource-group Tao --vm-name tao-porter --subscription a05dc9e8-4da7-4b35-8f79-e13c73b711e5
OpenSSH_8.9p1, OpenSSL 1.1.1k  FIPS 25 Mar 2021

Expanded Security Maintenance for Applications is not enabled.

8 updates can be applied immediately.
8 of these updates are standard security updates.
To see these additional updates run: apt list --upgradable

Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status



The following Azure CLI version has been pre-installed. Begin using the Azure CLI by first configuring your credentials using az login
{
  "azure-cli": "2.64.0",
  "azure-cli-core": "2.64.0",
  "azure-cli-telemetry": "1.1.0",
  "extensions": {}
}


Welcome to the NVIDIA AI Enterprise image. This image provides an optimized
environment for running the deep learning and HPC containers from the
NVIDIA GPU Cloud Container Registry. As an NVIDIA AI Enterprise image user,
you are entitled to access the NVIDIA AI Enterprise Catalog with supported AI containers. 


Documentation on using this image and accessing the NVIDIA GPU Cloud
Container Registry can be found at:

Quick Start Guide: https://docs.nvidia.com/ai-enterprise/deployment-guide-cloud/0.1.0/ai-enterprise-vmi.html# 

Release notes and documentation:
https://docs.nvidia.com/ngc/ngc-deploy-public-cloud/ngc-azure/index.html 
Last login: Thu Sep  5 21:29:12 2024 from 52.249.234.31
sed: can't read /home/******@portereng.com/.profile: No such file or directory
Verification successful
The plan selected does not match the number of GPUs on your VM
Please choose a VM with 1 GPU(s)
Shutting down ...
Connection to 20.121.117.71 closed by remote host.
Connection to 20.121.117.71 closed.
Transferred: sent 5432, received 6184 bytes, in 3.0 seconds
Bytes per second: sent 1799.0, received 2048.1


It says the plan selected does not match the number of GPUs on my VM and to select a VM with 1 GPU. But based on the title of the image, it only has one GPU... Is there a quota increase request I can fill out to satisfy this requirement? I looked through the quotas available but I couldn't find anything when I searched for GPUs...

Any help is much appreciated!

Thanks,

Andrew

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
9,068 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Nikhil Duserla 8,180 Reputation points Microsoft External Staff Moderator
    2024-09-06T04:15:28.33+00:00

    Hi @Andrew Haselhan,

    Welcome to the Microsoft Q&A Platform! Thank you for asking your question here.

    We see from your query that you're having an issue. Here is the solution we suggest:

    NC-series VMs are ideal for training complex machine learning models and running AI applications. The NVIDIA GPUs provide significant acceleration for computations typically involved in deep learning and other intensive training tasks.

    Please refer to this link for more information: https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nc-family#workloads-and-use-cases

    In this case you need to select a VM with 1 GPU to match the NVIDIA AI Enterprise is NCads_H100_v5 sizes series.

    The NCads H100 v5 series virtual machines (VMs) are a new addition to the Azure GPU family. You can use this series for real-world Azure Applied AI training and batch inference workloads. The NCads H100 v5 series virtual machines are powered by NVIDIA H100 NVL GPU and 4th-generation AMD EPYC™ Genoa processors. The VMs feature up to 2 NVIDIA H100 NVL GPUs with 94GB memory each, up to 96 non-multithreaded AMD EPYC Genoa processor cores and 640 GiB of system memory. These VMs are ideal for real-world Applied AI workloads.

    For more detailed information please refer to this: https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/ncadsh100v5-series?tabs=sizeaccelerators

    User's image To run a virtual machine with NVIDIA AI Enterprise, you need to request a quota for the specific GPU model you want to use in each region, as well as a global quota.

    If you have any further queries, do let us know.

    If the answer is helpful, please click "Accept Answer" and "Upvote it."


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.