'm noticing rate limits and throttling issues during high usage.

Question

'm noticing rate limits and throttling issues during high usage.

adan ameen 0

I'm working on integrating Azure OpenAI GPT-4o into my chatbot, but I'm noticing rate limits and throttling issues during high usage.

Even though I’ve checked my quota limits in the Azure portal, I still get 429 errors (Too Many Requests) when multiple users interact with the bot simultaneously. Would increasing my SKU tier help, or is there a way to optimize requests for better performance?

Azar 29,520 Reputation points MVP Volunteer Moderator

2025-03-26T17:41:56.69+00:00

Hey again adan ameen

Checking if this answer helped if it did kindly accept the response.

2 answers

Your answer

Azar 29,520 Reputation points MVP Volunteer Moderator

2025-03-26T17:41:56.69+00:00

Hey again adan ameen

Checking if this answer helped if it did kindly accept the response.

Answer 1

Azar 29,520 MVP Volunteer Moderator

Hi there

Try upgrading to a higher SKU tier can help, but first, check your quota limits in the Azure portal and request an increase if needed. try batching requests, reducing unnecessary API calls, and implementing caching for frequently used responses. Also, use Azure OpenAI Rate Limit headers to monitor usage patterns and adjust accordingly. If traffic is unpredictable, implementing a queueing mechanism can help distribute requests more efficiently.

If this helps kindly accpt the answer thanks.

Answer 2

VSawhney 800 Microsoft External Staff Moderator

Hello adan ameen,

When a deployment is created, the assigned TPM will directly map to the tokens-per-minute rate limit enforced on its inferencing requests. A Requests-Per-Minute (RPM) rate limit will also be enforced whose value is set proportionally to the TPM assignment using the following ratio:

6 RPM per 1000 TPM.

The flexibility to distribute TPM globally within a subscription and region has allowed Azure OpenAI Service to loosen other restrictions:

Increase TPM from model deployment to avail higher RPM and rate limit failure threshold.
Create a multiple regions to deal regional outages, you can create outage alert from Azure status and perform remedial steps accordingly.
Lessen your input query size and reduce max_token and
Adopt retry in code

Reference - https://cookbook.openai.com/examples/how_to_handle_rate_limits

Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

Thank you!

Muthu Ramakrishnan 21 Reputation points

2025-03-31T10:17:35.32+00:00

Thanks for the input.

We noticed this error on azure portal (top right corner) just recently (from last week). Is there a change in Microsoft policy around throttling? why azure portal actions are included for throttling. We have not seen this throttling issue on the portal before.

Any insight on this will be helpful. TIA
VSawhney 800 Reputation points Microsoft External Staff Moderator

2025-04-01T07:20:03.3766667+00:00

Hello Muthu Ramakrishnan,

Could you please open a new thread for this issue, and give more details, the region you are using and have you recently increased the load (Requests-Per-Minute (RPM))?

Thank you!

Share via

'm noticing rate limits and throttling issues during high usage.

2 answers

Your answer