OpenAI RateLimitError persists even after increasing request limit for gpt-35-turbo-16k model on Azure

Anonymous
2023-10-17T11:20:57.07+00:00

I'm currently working with the gpt-35-turbo-16k model from OpenAI, deployed on Azure. Initially, I encountered a RateLimitError due to hitting the rate limit of 6 requests per minute. I then changed the limit to 60 requests per minute and retried, expecting the issue to be resolved.

However, I'm still facing the same error:

"openai.error.RateLimitError: Requests to the Creates a completion for the chat message Operation under Azure OpenAI API version 2023-05-15 have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 36 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit."

I have confirmed that I updated the rate limit correctly, and I waited for a reasonable amount of time for the change to take effect.

Is there something I might be missing, or does it take more time for rate limit changes to propagate? Has anyone else encountered a similar issue, and if so, how did you resolve it?

Thanks in advance for any insights or suggestions!

Region = Switzerland North

api_type = "azure"
api_version = "2023-05-15"
engine = "lund-gpt-35-turbo-16k"

User's image

User's image

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,237 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Anonymous
    2023-10-17T12:51:29.0466667+00:00

    Delete and redeploy the model

    0 comments No comments

  2. Pramod Valavala 20,636 Reputation points Microsoft Employee
    2023-10-17T14:30:07.32+00:00

    @Ensar Kaya While you have control over the rate limits at the deployment, there are hard limits at a regional level as well which you could have hit. Do note that these are shared limits at the subscription level across all Azure OpenAI Service resources.

    Also, while you might have not made as many requests, it is also possible that you hit the token limit first.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.