Clarification about Azure OpenAi quotas and limits

Question

Clarification about Azure OpenAi quotas and limits

Christian 185

Hi,

after reading https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits and https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/quota?tabs=rest it's still not clear to me the Token per minute (TKM) across models, regions and subscriptions.

For example:

model: gpt4-o

deployment type: global-standard

Azure subscription: name: mySubscription1

Azure openAI resource in "mySubscription1": name: myOpenAI1 | region: sweden central

In "myOpenAI1", max combined TKM of all my gpt4-o deployments is 450k? That is, I can create ONE deployment with 450k TKM, or for example 2 deployments, one with 200k and other with 250k. That is correct?

What about if I create a SECOND azure openAi resource in sweden central:

Azure openAI resource in "mySubscription1": name: myOpenAI2 | region: sweden central

The quota for gtp4-o in myOpenAI1 and myOpenAI2 is shared?

What about if I create a SECOND Azure subscription:

Azure subscription: name: mySubscription2

and in that subscription I create an Azure OpenAI resource:

Azure openAI resource in "mySubscription2": name: myOpenAI3 | region: sweden central

In this case, quota from myOpenAI3 is SEPARATED (not shared) from the quota in "myOpenAI1" and "myOpenAI2" from gtp4-o models?

That is:

450k TKM for "myOpenAI3" (sweden central) in subscription mySubscription2
450k TKM shared between "myOpenAI1" (sweden central) and "myOpenAI2" (sweden central) in mySubscription1?

Is that correct?

Accepted answer

1 additional answer

Your answer

Answer 1

romungi-MSFT 48,911 Microsoft Employee Moderator

@Christian To add to the above answer, the limits of deployments are not shared between subscriptions. So, you can have 450k TKM for "myOpenAI3" (sweden central) deployment in subscription mySubscription2 and 450k TKM shared between "myOpenAI1" (sweden central) and "myOpenAI2" (sweden central) deployments in mySubscription1.

However, there is a concept of Usage tier which counts against your tenant and all the subscriptions under it. Where a customer’s usage is defined per model and is the total tokens consumed across all deployments in all subscriptions in all regions for a given tenant. The limits for these are mentioned here. For example, for gpt4-o the maximum usage tier limit is 12 billion tokens for all deployments under all subscriptions under a single tenant.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Christian 185 Reputation points

2024-12-16T15:19:59.3233333+00:00

@romungi-MSFT thanks for the reply. The 12 billion token for all deployments under all subscriptions under a single tenant is per month? 12 billion token per month?
romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2024-12-17T04:09:44.45+00:00

@Christian Yes, these are per month.

Also, it would be great if you could retake the survey and accept the answer if it helped. Thanks!!
Christian 185 Reputation points

2024-12-17T08:07:40.6733333+00:00

@romungi-MSFT done, was waiting for this last clarification. As a suggestion, the documentation should clarify "per month". May be obvious but IMO documentation should be very clear.

Answer 2

Adharsh Santhanam 6,020 Volunteer Moderator

Hello Christian, let me try to clarify how this works. Quota is assigned to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM). Your subscription is onboarded with a default quota for most models. You can allocate TPM among deployments until reaching quota. If you exceed a model's TPM limit in a region, you can reassign quota among deployments or request a quota increase. Alternatively, if viable, consider creating a deployment in a new Azure region in the same geography as the existing one.

For example, with a 240,000 TPM quota for GPT-35-Turbo in East US, you could create one deployment of 240K TPM, two of 120K TPM each, or multiple deployments adding up to less than 240K TPM in that region.

There is also a limit of 30 Azure OpenAI resource instances per region.

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Jari Lehtonen 0 Reputation points

2024-12-22T07:03:20.9466667+00:00

When I'm trying to deploy an example Azure project from Github I'm getting an error message saying that I have insufficient quota and the creation fails. So I do not have a model to start with, and it is not created either because the deployment fails. Very unclear indeed. This is the project I'm trying to deploy: https://github.com/Azure-Samples/azure-search-openai-demo

Happens with other similar projects as well.

I'm really wondering about this difficulty because Azure AI chatbot are such a hot topic. Is there some Azure AI chatbot project e.g. in Github that Microsoft recommends ?

Share via

Clarification about Azure OpenAi quotas and limits

1 additional answer

Your answer