Hello Lukas, thanks for reaching out to us, I have seen the same issue from other user and have already escalated this issue for investigation.
There are two possible reasons for this issue, the first one we can skip it since you confirmed that you are under the limit. (You can always check on this point in your Azure OpenAI Studio.)
If your application receives a response code 429 (too many requests) while your workload is within the defined limits, then this is a transient error thrown while the Azure OpenAI service is scaling up to your demand and didn't reach the required scale. For this reason, the resource didn't have sufficient resources to serve the request.
To resolve the issue, wait some time before trying your request again.
Solutions
- To resolve 429 errors caused by exceeding a quota limit:
- Implement exponential backoff retry logic in your application.
- Avoid sharp changes in the workload. Increase the workload gradually.
- To resolve 429 errors caused by back-end scaling, wait some time before trying the request again. The above mentioned retry logic can be helpful.
Please let us know how it works, I hope this helps.
Regards,
Yutong
-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.