Help with deploying LLama3 on Azure AI Studio

Question

I get the "You have no dedicated quota. A temporary 168-hour endpoint will be created for you. Alternatively, you can request for quota for persistent endpoints . Learn more about shared quota." error when trying to deploy any model on azure ai studio. I deployed models before without any errors but deleted them because I didn't need them anymore. Now I want to deploy new ones but get the error above. Do I need to have quotas in a specific type of cluster family or any would suffice?

User's image

These are the vms I have quota in. Do I need anything else. I am sure I am trying to deploy in the right region.

User's image

These are the vms I have quota in. Do I need anything else. I am sure I am trying to deploy in the right region.

Answer

Murat Greetings & Welcome to Microsoft Q&A forum!

I understand that you have used and deleted the VMs earlier. There is a possibility that quota is not released for the use. There is a VM family vCPU quota as well as a VM regional vCPU quota so you may be running into a scenario where you can deploy VMs in a specific region.

To give more context, For deployment and inferencing of Meta Llama 3.1 models with managed compute, you consume virtual machine (VM) core quota that is assigned to your subscription on a per-region basis. When you sign up for Azure AI Studio, you receive a default VM quota for several VM families available in the region.

Azure AI Studio compute has a default quota limit on both the number of cores and the number of unique compute resources that are allowed per region in a subscription.

The quota on the number of cores is split by each VM Family and cumulative total cores.
The quota on the number of unique compute resources per region is separate from the VM core quota, as it applies only to the managed compute resources

To raise the limits for compute, you can request a quota increase in the Azure AI Studio.

Available resources include:

Dedicated cores per region have a default limit of 24 to 300, depending on your subscription offer type. You can increase the number of dedicated cores per subscription for each VM family. Specialized VM families like NCv2, NCv3, or ND series start with a default of zero cores. GPUs also default to zero cores.
Total compute limit per region has a default limit of 500 per region within a given subscription and can be increased up to a maximum value of 2500 per region. This limit is shared between compute instances, and managed online endpoint deployments. A compute instance is considered a single-node cluster for quota purposes. In order to increase the total compute limit, open an online customer support request.

Please see How to deploy Meta Llama 3.1 models with Azure AI Studio and Manage and increase quotas for resources with Azure AI Studio for more information.

Share via

Help with deploying LLama3 on Azure AI Studio

1 answer

Your answer