Why token limits on production unexpectedly decreased from 1M to 150k ?

Question

Why token limits on production unexpectedly decreased from 1M to 150k ?

Marharyta Lapshykova 30

We've noticed that the token limit for our production model (Sweden Central Region, gpt-4o-default, Standard quota type) has been decreased from 1M tokens to 150k tokens. Could anyone please clarify why this change occurred and explain how adjustments to token limits are managed in such cases?

Recent changes:

We deployed two models under the Global Standard quota type (Sweden Central Region, gpt-4o).
Our production model, gpt-4o-default, has auto-update enabled and was recently updated to a new default version.
We have currently relocated one of the models to another EU region to mitigate the impact of this issue.

The question is: how is the quota calculated when multiple models of the same type are deployed under both Global Standard and Standard quota types within the same EU region?

Many thanks in advance

Prashanth Veeragoni 5,170 Reputation points Microsoft External Staff Moderator

2025-04-28T16:46:49.13+00:00

Hi Marharyta Lapshykova,

Following up to see if the above suggestion was helpful. And, if you have any further query do let me know.

Thank you!

Accepted answer

1 additional answer

Your answer

Prashanth Veeragoni 5,170 Reputation points Microsoft External Staff Moderator

2025-04-28T16:46:49.13+00:00

Hi Marharyta Lapshykova,

Following up to see if the above suggestion was helpful. And, if you have any further query do let me know.

Thank you!

Answer 1

Hi Marharyta Lapshykova,

I understand that you've observed a drop in token limit from 1M to 150K for your gpt-4o-default model deployed in Sweden Central under Standard quota type, even though you have:

· Multiple models deployed in the same region.

· Some models under Global Standard quota type.

· One model relocated to another EU region to ease the situation.

· Auto-update is enabled for the production model, and it updated to a new version.

Quota Splitting & Auto-Update Behaviour

Here’s a breakdown of what’s likely going on:

1.Azure OpenAI Quota Is Per Region and Quota Type

Each quota type (Standard vs. Global Standard) is managed independently per region.

If you have:

· Two models in Sweden Central, one under Standard and another under Global Standard,

· Then you are subject to two separate quotas:

o One for each quota type.

o Limits for Standard type models are typically much lower than Global Standard.

If your production model (gpt-4o-default) was auto-updated to a new version of gpt-4o, it may have inherited the quota type (Standard), but newer versions may default to lower token allocations under that quota type.

2.Impact of Auto-Update

The auto-update mechanism replaces the model version but does not necessarily preserve the token limit unless it's explicitly managed through quota requests or reservations.

Thus:

· Your model was updated.

· The new version of gpt-4o possibly consumes Standard quota.

· If your Standard quota for Sweden Central was not updated or shared across other models, it resulted in the drop to 150K tokens.

To Resolve the Issue:

1.Check Quota Allocation in Azure Portal

Go to:

Azure Portal → Azure OpenAI Resource → Quota → Filter by Region (Sweden Central) → Check both Standard and Global Standard usage.

2.Verify Which Models Are Using Which Quota

In your resource:

· Review each deployment.

· Check under “Quota type” in each model's deployment metadata.

You’ll likely see that:

· One or more models are consuming the limited Standard quota.

· Others may be under Global Standard.

3.Pin to Specific Model Version (Optional)

To avoid auto-update issues in the future:

· Deploy a specific version of gpt-4o, not the -default one.

· This prevents unintentional shifts in quota consumption.

Hope this helps, do let me know if you have any further queries.

Thank you!

Prashanth Veeragoni 5,170 Reputation points Microsoft External Staff Moderator

2025-04-29T16:54:06.0433333+00:00

Hi Marharyta Lapshykova,

Just checking in to see if you have got a chance to see my response to your question in resolving the issue.

If you are still facing any further issues, please don't hesitate to reach out to us. We are happy to assist you.

Looking forward to your response and appreciate your time on this.

If you feel that your quires have been resolved, please accept the answer by clicking the "Upvote" and "Accept Answer" on the post.

Thank you!

Answer 2

Hello Marharyta Lapshykova,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you are having issue of the unexpected decrease in token limits.

@Prashanth Veeragoni have really responded well. But it is a misconception that auto-updates preserve the existing token limits; it is not always the case. Everything you need to know and resolve the issue permanently are here in the following links:

I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Share via

Why token limits on production unexpectedly decreased from 1M to 150k ?

1 additional answer

Your answer