OpenAI GPT4 API rate limit error

highever highever 0 Reputation points
2023-11-27T02:58:50.8466667+00:00

I'm getting a ChatGPT error 429 message when calling GPT4API for 10 consecutive requests. The error says that I exceeded the token rate limit of my current OpenAI SO pricing tier. I tried to adjust the credit limit by submitting an application, but I didn't receive any response. Can someone suggest how to resolve this issue? Here's the exact error message:

ChatGPT error 429: {"error":{"code":"429","message": "Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 50 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit."}}
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,574 questions
{count} votes

1 answer

Sort by: Most helpful
  1. AshokPeddakotla-MSFT 30,076 Reputation points
    2023-11-27T04:20:27.43+00:00

    highever highever Greetings & Welcome to Microsoft Q&A forum!

    I'm getting a ChatGPT error 429 message when calling GPT4API for 10 consecutive requests.The error says that I exceeded the token rate limit of my current OpenAI SO pricing tier.

    To give more context, As each request is received, Azure OpenAI computes an estimated max processed-token count that includes the following:

    • Prompt text and count
    • The max_tokens parameter setting
    • The best_of parameter setting

    As requests come into the deployment endpoint, the estimated max-processed-token count is added to a running token count of all requests that is reset each minute. If at any time during that minute, the TPM rate limit value is reached, then further requests will receive a 429 response code until the counter resets. For more details, see Understanding rate limits.

    I tried to adjust the credit limit by submitting an application, but I didn't receive any response.

    I understand that you have already submitted an application for quota increase through https://aka.ms/oai/quotaincrease  

    As mentioned in the form, Priority will be given to customers who generate traffic that consumes the existing quota allocation, and your request may be denied if this condition is not met.

    We will make every effort to accommodate your request; however, allocation is based on our current capacity and future deployments, and is subject to availability.

    I would suggest you, Kindly wait for request approval.

    Can someone suggest how to resolve this issue?

    To minimize issues related to rate limits, it's a good idea to use the following techniques:

    • Set max_tokens and best_of to the minimum values that serve the needs of your scenario. For example, don’t set a large max-tokens value if you expect your responses to be small.
    • Use quota management to increase TPM on deployments with high traffic, and to reduce TPM on deployments with limited needs.
    • Implement retry logic in your application.
    • Avoid sharp changes in the workload. Increase the workload gradually.
    • Test different load increase patterns.

    Hope this helps. Do let me know if you have any further queries.


    If the response helped, please do click Accept Answer and Yes for was this answer helpful.

    Doing so would help other community members with similar issue identify the solution. I highly appreciate your contribution to the community.

    0 comments No comments