429 rate limit error with max_tokens=800 in both East US and Sweden

Question

I am receiving 429 error after having done only a couple dozen requests today in total (and none since days before). I am trying both East US and Sweden data centers with GPT-4 and GPT-4o using max_tokens=800 over several hours.

So from all I can tell none of the general recommendations (https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/quota?tabs=rest#understanding-rate-limits) and causes for a 429 error apply unless both data centers continue to be overloaded for several hours (if so is there any place to check if this is the case?).

Any hint what else I can try?

Accepted Answer

Hello Lukas,

Thanks for sharing you solution for this issue, to let this root cause be seen by more users who may have the same issue, I will post the workaround from you here -

The issue was that my .env implementation was not working as expected, so I was infact always sending requests to the wrong deployment.

Please feel free to accept the answer here so that this answer can be seen.

Regards,

Yutong

Answer

Hello Lukas, thanks for reaching out to us, I have seen the same issue from other user and have already escalated this issue for investigation.

There are two possible reasons for this issue, the first one we can skip it since you confirmed that you are under the limit. (You can always check on this point in your Azure OpenAI Studio.)

If your application receives a response code 429 (too many requests) while your workload is within the defined limits, then this is a transient error thrown while the Azure OpenAI service is scaling up to your demand and didn't reach the required scale. For this reason, the resource didn't have sufficient resources to serve the request.

To resolve the issue, wait some time before trying your request again.

Solutions

To resolve 429 errors caused by exceeding a quota limit:
- Implement exponential backoff retry logic in your application.
- Avoid sharp changes in the workload. Increase the workload gradually.
To resolve 429 errors caused by back-end scaling, wait some time before trying the request again. The above mentioned retry logic can be helpful.

Please let us know how it works, I hope this helps.

Regards,

Yutong

-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

Share via

429 rate limit error with max_tokens=800 in both East US and Sweden

1 additional answer