Share via

Azure OpenAi GPT 4o returning "rate_limit_exceeded"

Arun Kumar Ganesh 5 Reputation points
Nov 21, 2024, 3:30 PM

Hello,

We have setup a pay as you go account, and we are using Azure openAI services for our internal use case. We have been using GPT-4o for the last 3 months with Assistants API and Azure AI search as well, it was working properly and most of our use case is on document Q&A.

Suddenly from last week, we were facing issues in Assistants API with GPT-40 alone and it is returning "rate_limit_exceeded", while the Azure AI search is still working good on the same use case.

Can someone suggest what could be the possible root cause for this issue? Are there any possibilities for the updates in GPT-4o models, how can we track those updates? We need to make sure this should not happen in future as well Because of this, we had to switch back to GPT-3.5-Turbo to make this work back.

Region - EastUS2

Error Log - "last_error": {

"code": "rate_limit_exceeded",

"message": "Rate limit is exceeded. Try again in 54 seconds."

},

"model": "gpt-4o",

Even after trying again sometime, the results are same and sometimes it prompts us to try again after 24 hours.

Thanks in Advance!

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,310 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,932 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Pavankumar Purilla 1,305 Reputation points Microsoft Vendor
    Nov 21, 2024, 11:27 PM

    Hi Arun Kumar Ganesh,

    Greetings & Welcome to the Microsoft Q&A forum! Thank you for sharing your query.
    The error message is related to rate limits, which is a common practice in APIs to prevent abuse and ensure fair usage.

    Azure OpenAI’s quota feature enables assignment of rate limits to your deployments, up-to a global limit called your “quota.” Quota is assigned to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM).

    You can check this documentation for more details.
    2024-11-22 04_44_51-Models + endpoints - Azure OpenAI Studio and 27 more pages - Work - Microsoft​ E
    To give more context, Tokens-Per-Minute (TPM) and Requests-Per-Minute (RPM) rate limits for the deployment.

    TPM rate limits are based on the maximum number of tokens that are estimated to be processed by a request at the time the request is received.

    RPM rate limits are based on the number of requests received over time. The rate limit expects that requests be evenly distributed over a one-minute period. If this average flow isn't maintained, then requests may receive a 429 response even though the limit isn't met when measured over the course of a minute.

    Please see Manage Azure OpenAI Service quota for more details.
    To view your quota allocations across deployments in a given region, select Shared Resources> Quota in Azure OpenAI studio and click on the link to increase the quota
    User's image

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.