Quota Limit/ Usage

Question

User's image

My account is a school account and I'm experimenting with Azure OpenAI Services.
I'm about to use GPTs' models, but I checked it got Limit - Tokens Per Minute 100%.
I just deployed 1 time and cannot do anything else since it's said no quota/ tokens available.

I understand it's Per Minute so it's counted on every minute, am I correct?

Answer

Hello Đặng Hoàn Mỹ, you may want to check if you have the permissions to increase the quota on your subscription/service. More specifically, to answer your question, TPM rate limits are based on the maximum tokens estimated to be processed when the request is received. It is different than the token count used for billing, which is computed after all processing is completed. Azure OpenAI calculates a max processed-token count per request using

Prompt text and count
The max_tokens setting
The best_of setting

This estimated count is added to a running token count of all requests, which resets every minute. A 429 response code is returned once the TPM rate limit is reached within the minute. You may find this article to be a good reference to read -- https://techcommunity.microsoft.com/t5/fasttrack-for-azure/optimizing-azure-openai-a-guide-to-limits-quotas-and-best/ba-p/4076268

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Share via

Quota Limit/ Usage

1 answer