Clarification about Azure OpenAi quotas and limits

Christian 185 Reputation points
2024-12-16T12:43:35.8033333+00:00

Hi,

after reading https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits and https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/quota?tabs=rest it's still not clear to me the Token per minute (TKM) across models, regions and subscriptions.

For example:

model: gpt4-o

deployment type: global-standard

Azure subscription: name: mySubscription1

Azure openAI resource in "mySubscription1": name: myOpenAI1 | region: sweden central

In "myOpenAI1", max combined TKM of all my gpt4-o deployments is 450k? That is, I can create ONE deployment with 450k TKM, or for example 2 deployments, one with 200k and other with 250k. That is correct?

What about if I create a SECOND azure openAi resource in sweden central:

Azure openAI resource in "mySubscription1": name: myOpenAI2 | region: sweden central

The quota for gtp4-o in myOpenAI1 and myOpenAI2 is shared?

What about if I create a SECOND Azure subscription:

Azure subscription: name: mySubscription2

and in that subscription I create an Azure OpenAI resource:

Azure openAI resource in "mySubscription2": name: myOpenAI3 | region: sweden central

In this case, quota from myOpenAI3 is SEPARATED (not shared) from the quota in "myOpenAI1" and "myOpenAI2" from gtp4-o models?

That is:

  • 450k TKM for "myOpenAI3" (sweden central) in subscription mySubscription2
  • 450k TKM shared between "myOpenAI1" (sweden central) and "myOpenAI2" (sweden central) in mySubscription1?

Is that correct?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,092 questions
0 comments No comments
{count} votes

Accepted answer
  1. romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator
    2024-12-16T13:55:11.7066667+00:00

    @Christian To add to the above answer, the limits of deployments are not shared between subscriptions. So, you can have 450k TKM for "myOpenAI3" (sweden central) deployment in subscription mySubscription2 and 450k TKM shared between "myOpenAI1" (sweden central) and "myOpenAI2" (sweden central) deployments in mySubscription1.

    However, there is a concept of Usage tier which counts against your tenant and all the subscriptions under it. Where a customer’s usage is defined per model and is the total tokens consumed across all deployments in all subscriptions in all regions for a given tenant. The limits for these are mentioned here. For example, for gpt4-o the maximum usage tier limit is 12 billion tokens for all deployments under all subscriptions under a single tenant.

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Adharsh Santhanam 6,015 Reputation points Volunteer Moderator
    2024-12-16T13:24:28.96+00:00

    Hello Christian, let me try to clarify how this works. Quota is assigned to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM). Your subscription is onboarded with a default quota for most models. You can allocate TPM among deployments until reaching quota. If you exceed a model's TPM limit in a region, you can reassign quota among deployments or request a quota increase. Alternatively, if viable, consider creating a deployment in a new Azure region in the same geography as the existing one.

    For example, with a 240,000 TPM quota for GPT-35-Turbo in East US, you could create one deployment of 240K TPM, two of 120K TPM each, or multiple deployments adding up to less than 240K TPM in that region.

    There is also a limit of 30 Azure OpenAI resource instances per region.

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    2 people found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.