I understand that you've observed a drop in token limit from 1M to 150K for your gpt-4o-default model deployed in Sweden Central under Standard quota type, even though you have:
· Multiple models deployed in the same region.
· Some models under Global Standard quota type.
· One model relocated to another EU region to ease the situation.
· Auto-update is enabled for the production model, and it updated to a new version.
Quota Splitting & Auto-Update Behaviour
Here’s a breakdown of what’s likely going on:
1.Azure OpenAI Quota Is Per Region and Quota Type
Each quota type (Standard vs. Global Standard) is managed independently per region.
If you have:
· Two models in Sweden Central, one under Standard and another under Global Standard,
· Then you are subject to two separate quotas:
o One for each quota type.
o Limits for Standard type models are typically much lower than Global Standard.
If your production model (gpt-4o-default) was auto-updated to a new version of gpt-4o, it may have inherited the quota type (Standard), but newer versions may default to lower token allocations under that quota type.
2.Impact of Auto-Update
The auto-update mechanism replaces the model version but does not necessarily preserve the token limit unless it's explicitly managed through quota requests or reservations.
Thus:
· Your model was updated.
· The new version of gpt-4o possibly consumes Standard quota.
· If your Standard quota for Sweden Central was not updated or shared across other models, it resulted in the drop to 150K tokens.
To Resolve the Issue:
1.Check Quota Allocation in Azure Portal
Go to:
Azure Portal → Azure OpenAI Resource → Quota → Filter by Region (Sweden Central) → Check both Standard and Global Standard usage.
2.Verify Which Models Are Using Which Quota
In your resource:
· Review each deployment.
· Check under “Quota type” in each model's deployment metadata.
You’ll likely see that:
· One or more models are consuming the limited Standard quota.
· Others may be under Global Standard.
3.Pin to Specific Model Version (Optional)
To avoid auto-update issues in the future:
· Deploy a specific version of gpt-4o, not the -default one.
· This prevents unintentional shifts in quota consumption.
Hope this helps, do let me know if you have any further queries.
Thank you!