Azure OpenAi GPT 4o returning "rate_limit_exceeded"

Question

Azure OpenAi GPT 4o returning "rate_limit_exceeded"

Arun Kumar Ganesh 25

Hello,

We have setup a pay as you go account, and we are using Azure openAI services for our internal use case. We have been using GPT-4o for the last 3 months with Assistants API and Azure AI search as well, it was working properly and most of our use case is on document Q&A.

Suddenly from last week, we were facing issues in Assistants API with GPT-40 alone and it is returning "rate_limit_exceeded", while the Azure AI search is still working good on the same use case.

Can someone suggest what could be the possible root cause for this issue? Are there any possibilities for the updates in GPT-4o models, how can we track those updates? We need to make sure this should not happen in future as well Because of this, we had to switch back to GPT-3.5-Turbo to make this work back.

Region - EastUS2

Error Log - "last_error": {

"code": "rate_limit_exceeded",

"message": "Rate limit is exceeded. Try again in 54 seconds."

},

"model": "gpt-4o",

Even after trying again sometime, the results are same and sometimes it prompts us to try again after 24 hours.

Thanks in Advance!

Accepted answer

0 additional answers

Your answer

Answer 1

Pavankumar Purilla 8,335 Microsoft External Staff Moderator

Hi Arun Kumar Ganesh,

Greetings & Welcome to the Microsoft Q&A forum! Thank you for sharing your query.
The error message is related to rate limits, which is a common practice in APIs to prevent abuse and ensure fair usage.

Azure OpenAI’s quota feature enables assignment of rate limits to your deployments, up-to a global limit called your “quota.” Quota is assigned to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM).

You can check this documentation for more details.
2024-11-22 04_44_51-Models + endpoints - Azure OpenAI Studio and 27 more pages - Work - Microsoft E
To give more context, Tokens-Per-Minute (TPM) and Requests-Per-Minute (RPM) rate limits for the deployment.

TPM rate limits are based on the maximum number of tokens that are estimated to be processed by a request at the time the request is received.

RPM rate limits are based on the number of requests received over time. The rate limit expects that requests be evenly distributed over a one-minute period. If this average flow isn't maintained, then requests may receive a 429 response even though the limit isn't met when measured over the course of a minute.

Please see Manage Azure OpenAI Service quota for more details.
To view your quota allocations across deployments in a given region, select Shared Resources> Quota in Azure OpenAI studio and click on the link to increase the quota
User's image

Hope this helps. Do let us know if you have any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator

2024-11-22T22:01:42.3666667+00:00

Hi Arun Kumar Ganesh,
Hope you are having a great day.

Just checking in to see if you have got a chance to see my response to your question in resolving the issue.

If you are still facing any further issues, please don't hesitate to reach out to us. We are happy to assist you.

Looking forward to your response and appreciate your time on this.

If you feel that your quires have been resolved, please accept the answer by clicking the "Upvote" and "Accept Answer" on the post.
Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator

2024-11-25T16:36:23.0033333+00:00

Hi Arun Kumar Ganesh,
Hope you are having a great day.

Just checking in to see if you have got a chance to see my response to your question in resolving the issue.

If you are still facing any further issues, please don't hesitate to reach out to us. We are happy to assist you.

Looking forward to your response and appreciate your time on this.
If you feel that your quires have been resolved, please accept the answer by clicking the "Upvote" and "Accept Answer" on the post.
Arun Kumar Ganesh 25 Reputation points

2024-11-27T09:30:24.3833333+00:00

Hi @Pavankumar Purilla

Thanks for your response. We will raise the quota increase request for this issue. But the same query was working perfectly 2 weeks back in Assistant API and it is also working fine with Azure AI search implementation. It is not working only for the Assistant implementation. We are using same document, query and LLM for both AI seach and Assitant API.

Any possible reasons that could affect only the assistant APIs? Is there any updates that happened on GPT 4o deployments recently?

Thanks,

Arun
Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator

2024-11-28T01:33:11.8233333+00:00

Hi Arun Kumar Ganesh,
I hope you are doing great.
Thank you for the update. While the same query, document, and LLM work for Azure AI Search but not for the Assistant API, the issue could be due to differences in how the two handle requests. The Assistant API may have unique configurations, such as memory settings or custom instructions, that increase token usage or processing complexity. Additionally, rate-limiting behaviors may differ between the services, or there could be regional demand affecting the Assistant API more. No major updates specific to GPT-4o deployments have been reported recently.

Share via

Azure OpenAi GPT 4o returning "rate_limit_exceeded"

0 additional answers

Your answer