Why token limits on production unexpectedly decreased from 1M to 150k ?

Marharyta Lapshykova 30 Reputation points
2025-04-25T09:41:49.9166667+00:00

We've noticed that the token limit for our production model (Sweden Central Region, gpt-4o-default, Standard quota type) has been decreased from 1M tokens to 150k tokens. Could anyone please clarify why this change occurred and explain how adjustments to token limits are managed in such cases?

Recent changes:

  • We deployed two models under the Global Standard quota type (Sweden Central Region, gpt-4o).
  • Our production model, gpt-4o-default, has auto-update enabled and was recently updated to a new default version.
  • We have currently relocated one of the models to another EU region to mitigate the impact of this issue.

The question is: how is the quota calculated when multiple models of the same type are deployed under both Global Standard and Standard quota types within the same EU region?

Many thanks in advance

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,998 questions
{count} votes

Accepted answer
  1. Prashanth Veeragoni 4,440 Reputation points Microsoft External Staff Moderator
    2025-04-25T14:51:37.47+00:00

    Hi Marharyta Lapshykova,

    I understand that you've observed a drop in token limit from 1M to 150K for your gpt-4o-default model deployed in Sweden Central under Standard quota type, even though you have:

    ·       Multiple models deployed in the same region.

    ·       Some models under Global Standard quota type.

    ·       One model relocated to another EU region to ease the situation.

    ·       Auto-update is enabled for the production model, and it updated to a new version.

    Quota Splitting & Auto-Update Behaviour

    Here’s a breakdown of what’s likely going on:

    1.Azure OpenAI Quota Is Per Region and Quota Type

    Each quota type (Standard vs. Global Standard) is managed independently per region.

    If you have:

    ·       Two models in Sweden Central, one under Standard and another under Global Standard,

    ·       Then you are subject to two separate quotas:

    o   One for each quota type.

    o   Limits for Standard type models are typically much lower than Global Standard.

    If your production model (gpt-4o-default) was auto-updated to a new version of gpt-4o, it may have inherited the quota type (Standard), but newer versions may default to lower token allocations under that quota type.

    2.Impact of Auto-Update

    The auto-update mechanism replaces the model version but does not necessarily preserve the token limit unless it's explicitly managed through quota requests or reservations.

    Thus:

    ·       Your model was updated.

    ·       The new version of gpt-4o possibly consumes Standard quota.

    ·       If your Standard quota for Sweden Central was not updated or shared across other models, it resulted in the drop to 150K tokens.

    To Resolve the Issue:

    1.Check Quota Allocation in Azure Portal

    Go to:

    Azure Portal → Azure OpenAI Resource → Quota → Filter by Region (Sweden Central) → Check both Standard and Global Standard usage.

    2.Verify Which Models Are Using Which Quota

    In your resource:

    ·       Review each deployment.

    ·       Check under “Quota type” in each model's deployment metadata.

    You’ll likely see that:

    ·       One or more models are consuming the limited Standard quota.

    ·       Others may be under Global Standard.

    3.Pin to Specific Model Version (Optional)

    To avoid auto-update issues in the future:

    ·       Deploy a specific version of gpt-4o, not the -default one.

    ·       This prevents unintentional shifts in quota consumption.

    Hope this helps, do let me know if you have any further queries.

    Thank you!

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Sina Salam 21,066 Reputation points Moderator
    2025-04-29T13:57:14.42+00:00

    Hello Marharyta Lapshykova,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are having issue of the unexpected decrease in token limits.

    @Prashanth Veeragoni have really responded well. But it is a misconception that auto-updates preserve the existing token limits; it is not always the case. Everything you need to know and resolve the issue permanently are here in the following links:

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.