I created a support ticket and spoke to a Microsoft employee. Apparently, the region I deployed to was experiencing heavy use and therefore, the default TKM and RPM were lowered. She requested that I deploy to another region and provided some regions with more capacity. In addition, she said to request for quota increase - the same info from the previous comment, and I did that and the request just went through. So, problem resolved.
Error code 429 - 'TooManyRequests'. Azure OpenAI - AI model deployed via AI Foundry.
In Azure AI Foundry, I have the gpt-4o model deployed. In the UI, it is grouped under the Azure AI service “ai-sig6-azure-ai-services_aoai”. In the Azure Portal, I have an Azure AI Service called ai-sig6-azure-ai-services. The gpt-4o model has TKM of 30K and RPM of 180. I try to send several requests in a row and 1 or 2 will succeed and then I get the error HTTP Status Code ‘TooManyRequests’. I should not be anywhere close to those limits. I think there must be another limit that I am hitting, but cannot find it in the Azure Portal or Azure AI Foundry.
The http headers when I get the ‘TooManyRequests’ are:
Here are the response headers:
Retry-After: 49
x-ratelimit-reset-tokens: 49
apim-request-id: 8ef18262-d6c3-4b3b-a2bf-7cf1ccdddfee
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Content-Type-Options: nosniff
policy-id: DeploymentRatelimit-Token
x-ms-region: East US 2
x-ratelimit-remaining-requests: 24
Date: Wed, 12 Feb 2025 14:14:46 GMT
Request failed with status code: TooManyRequests
What do I need to change so I don’t get this error?