Why is my GPT-4o mini quota getting used up so fast?

Question

Why is my GPT-4o mini quota getting used up so fast?

Heet Sarju Shah 0

I currently have a student account on which I have created an Azure OpenAI resource on which I have setup 2 deployments.

GPT-4o mini
Dalle-3

The main issue is with the GPT-4o mini. The quota limit it provides is very very minimal. I can only get like 2 or 3 requests out a day. Which totals to less than 600 - 800 tokens. I have the credits for a lot more. I even applied for a quota increase but got rejected. I am unable to understand why the quota is over so fast.

Also the cooldown every time is 86400 sec or 24hrs.

Please let me know what I should do? Also let me know if you require additional information.

Deployment Info:

Quota usage(This is pretty much like this all the time) Quota

1 answer

Your answer

Answer 1

AshokPeddakotla-MSFT 35,976 Moderator

Heet Sarju Shah Greetings & Welcome to Microsoft Q&A forum!

I understand that you are hitting quota limits.

To minimize issues related to rate limits, it's a good idea to use the following techniques:

Implement retry logic in your application.
Avoid sharp changes in the workload. Increase the workload gradually.
Test different load increase patterns.
Increase the quota assigned to your deployment. Move quota from another deployment, if necessary.

RPM rate limits are based on the number of requests received over time. The rate limit expects that requests be evenly distributed over a one-minute period. If this average flow isn't maintained, then requests may receive a 429 response even though the limit isn't met when measured over the course of a minute. To implement this behavior, Azure OpenAI Service evaluates the rate of incoming requests over a small period of time, typically 1 or 10 seconds. If the number of requests received during that time exceeds what would be expected at the set RPM limit, then new requests will receive a 429 response code until the next evaluation period. For example, if Azure OpenAI is monitoring request rate on 1-second intervals, then rate limiting will occur for a 600-RPM deployment if more than 10 requests are received during each 1-second period (600 requests per minute = 10 requests per second).

See Understanding rate limits for more details.

Azure for students subscription has limited quota on the resources.

Also, If you are not able to increase the quota on a student subscription, please contact customer service at any time so that we can adjust your limits appropriately.

You must upgrade your Azure for Students Starter subscription to a Pay-As-You-Go subscription to increase your quotas or limits. For more information, see Upgrade your Azure Free Trial subscription to a Pay-As-You-Go subscription

Do let me know if that helps or have any other queries.

Heet Sarju Shah 0 Reputation points

2024-09-24T05:55:46.6+00:00

Thanks for the detailed response Ashok.

I have waited over 3 days at this point but I am still getting the following error on my first request.

I have a 314 token prompt which I have sent only once. First time in 3 days so there is no way to reach the quota limit.

Please let me know if there is a way for me to do something to make this work, or is upgrading to pay as you go the only option.
AshokPeddakotla-MSFT 35,976 Reputation points Moderator

2024-09-24T07:18:29.95+00:00

Heet Sarju Shah The error message indicates that you’ve exceeded the token rate limit of your current AI Services S0 pricing tier.

As you mentioned, you are using Azure for Students subscription. Quota Limit in tokens per minute (TPM) for all models is 1K.

Please see Azure OpenAI Service quotas and limits for more details.

If you are having the quota and still seeing the issue, post deployment also you can adjust your TPM allocation by selecting Edit under Shared resources > Deployments in Azure OpenAI Studio.

Try to adjust the slide to max available limit and see if that solves the issue.

If your issue is still not resolved, then as suggested earlier, you might need to increase the quota.

To view your quota allocations across deployments in a given region, select Shared Resources -> Quota in Azure OpenAI studio and click on the link to increase the quota.

You must upgrade your Azure for Students Starter subscription to a Pay-As-You-Go subscription to increase your quotas or limits. For more information, see Upgrade your Azure Free Trial subscription to a Pay-As-You-Go subscription

Do let me know if that helps or have any other queries.

If the response helped, please do click Accept Answer and Yes for was this answer helpful.

Doing so would help other community members with similar issue identify the solution. I highly appreciate your contribution to the community.
Heet Sarju Shah 0 Reputation points

2024-09-24T13:17:26.1866667+00:00

I think you have misunderstood. I have not used Azure in the past 3 days. The quota limit is of 1k tokens. So for the first request of 314 tokens, it should work right? This is the error for the first request in 3 days. How is my quota over when I have not used it?

The quota is showing just 1. Not 1 of anything, no % value of how much is used.

Please solve this issue so I can go back to using this service.

Thanks

Update:

I think I have found the issue. I have tried deleting and redeploying the model to see if it may work. The model when tried to redeploy has 0 TPM as my rate limit. I can not increase it. It is the same for both GPT-4o and GPT-4o mini.
AshokPeddakotla-MSFT 35,976 Reputation points Moderator

2024-09-25T12:51:42.1966667+00:00

Heet Sarju Shah Please see private message.

Share via

Why is my GPT-4o mini quota getting used up so fast?

1 answer

Your answer