Why is my GPT-4o mini quota getting used up so fast?

Heet Sarju Shah 0 Reputation points
2024-09-22T11:23:15.1533333+00:00

I currently have a student account on which I have created an Azure OpenAI resource on which I have setup 2 deployments.

  1. GPT-4o mini
  2. Dalle-3

The main issue is with the GPT-4o mini. The quota limit it provides is very very minimal. I can only get like 2 or 3 requests out a day. Which totals to less than 600 - 800 tokens. I have the credits for a lot more. I even applied for a quota increase but got rejected. I am unable to understand why the quota is over so fast.

Also the cooldown every time is 86400 sec or 24hrs.

Please let me know what I should do? Also let me know if you require additional information.

Deployment Info:
Deployment info

Quota usage(This is pretty much like this all the time)Quota

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,132 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. AshokPeddakotla-MSFT 34,111 Reputation points
    2024-09-23T05:26:04.8466667+00:00

    Heet Sarju Shah Greetings & Welcome to Microsoft Q&A forum!

    I understand that you are hitting quota limits.

    To minimize issues related to rate limits, it's a good idea to use the following techniques:

    • Implement retry logic in your application.
    • Avoid sharp changes in the workload. Increase the workload gradually.
    • Test different load increase patterns.
    • Increase the quota assigned to your deployment. Move quota from another deployment, if necessary.

    RPM rate limits are based on the number of requests received over time. The rate limit expects that requests be evenly distributed over a one-minute period. If this average flow isn't maintained, then requests may receive a 429 response even though the limit isn't met when measured over the course of a minute. To implement this behavior, Azure OpenAI Service evaluates the rate of incoming requests over a small period of time, typically 1 or 10 seconds. If the number of requests received during that time exceeds what would be expected at the set RPM limit, then new requests will receive a 429 response code until the next evaluation period. For example, if Azure OpenAI is monitoring request rate on 1-second intervals, then rate limiting will occur for a 600-RPM deployment if more than 10 requests are received during each 1-second period (600 requests per minute = 10 requests per second).

    See Understanding rate limits for more details.

    Azure for students subscription has limited quota on the resources.

    Also, If you are not able to increase the quota on a student subscription, please contact customer service at any time so that we can adjust your limits appropriately.

    You must upgrade your Azure for Students Starter subscription to a Pay-As-You-Go subscription to increase your quotas or limits. For more information, see Upgrade your Azure Free Trial subscription to a Pay-As-You-Go subscription

    Do let me know if that helps or have any other queries.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.