I have an AI model deployed in Azure AI Foundry. When I call it via the API, I get 'TooManyRequests' after a couple of requests.

Stephen 85 Reputation points
2025-02-12T15:39:23.18+00:00

In Azure AI Foundry, I have the gpt-4o model deployed.  In the UI, it is grouped under the Azure AI service “ai-sig6-azure-ai-services_aoai”.  In the Azure Portal, I have an Azure AI Service called ai-sig6-azure-ai-services.  The gpt-4o model has TKM of 30K and RPM of 180.  I try to send several requests in a row and 1 or 2 will succeed and then I get the error HTTP Status Code ‘TooManyRequests’.  I should not be anywhere close to those limits. I think there must be another limit that I am hitting, but cannot find it in the Azure Portal or Azure AI Foundry.

The http headers when I get the ‘TooManyRequests’ are:

Here are the response headers:

Retry-After: 49

x-ratelimit-reset-tokens: 49

apim-request-id: 8ef18262-d6c3-4b3b-a2bf-7cf1ccdddfee

Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

X-Content-Type-Options: nosniff

policy-id: DeploymentRatelimit-Token

x-ms-region: East US 2

x-ratelimit-remaining-requests: 24

Date: Wed, 12 Feb 2025 14:14:46 GMT

Request failed with status code: TooManyRequests

What do I need to change so I don’t get this error?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,083 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,604 questions
{count} votes

Accepted answer
  1. kothapally Snigdha 3,020 Reputation points Microsoft External Staff Moderator
    2025-02-13T19:13:56.3566667+00:00

    Hi Stephen,

    I understand that you're experiencing a rate limit issues with your gpt-4o model deployed in Azure AI Foundry. The rate limits you are encountering (30K TPM and 180 RPM) are indeed lower than the limits specified for the Default Tier (450K TPM and 2.7K RPM).

    • The rate limits you see are likely tied to the specific Azure OpenAI Service resource you are using. Since you mentioned that the connected resource is ai-sig6-azure-ai-services_aoai, you should verify the quotas assigned to this resource.
    • In the Azure Portal, navigate to the Azure OpenAI section and check if there are any quotas or limits set specifically for this resource. If nothing appears, it may indicate that the resource is not configured to utilize the higher limits available for the gpt-4o model.
    • If you want to increase your rate limits to match those specified for the Default Tier, you may need to submit a quota increase request. This can be done through the quota increase request form. Keep in mind that priority is given to customers who generate traffic that consumes existing quota allocations.
    • Request for Quota Increase https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR4xPXO648sJKt4GoXAed-0pUMFE1Rk9CU084RjA0TUlVSUlMWEQzVkJDNCQlQCN0PWcu

    I hope these helps you. Thank you!

    0 comments No comments

3 additional answers

Sort by: Most helpful
  1. Ketsha 325 Reputation points Microsoft Employee
    2025-02-12T17:13:08.14+00:00

    Hi - Please use the following table and refer to the link below since this will help with the Quotas and limits that you are hitting.

    User's image

    enter image description here

    Here is the quota and limits information: https://learn.microsoft.com/en-us/azure/cognitive-services/openai/quotas-limits#quotas-and-limits-reference

    Also, GPT-4 model details: https://learn.microsoft.com/en-us/azure/cognitive-services/openai/concepts/models#gpt-4-models

    0 comments No comments

  2. Stephen 85 Reputation points
    2025-02-14T13:46:12.6066667+00:00

    I created a support ticket and spoke to a Microsoft employee. Apparently, the region I deployed to was experiencing heavy use and therefore, the default TKM and RPM were lowered. She requested that I deploy to another region and provided some regions with more capacity. In addition, she said to request for quota increase - the same info from the previous comment, and I did that and the request just went through. So, problem resolved.


  3. Stephen 85 Reputation points
    2025-02-14T13:48:09.7+00:00

    I created a support ticket and spoke to a Microsoft employee. Apparently, the region I deployed to was experiencing heavy use and therefore, the default TKM and RPM were lowered. She requested that I deploy to another region and provided some regions with more capacity. In addition, she said to request for quota increase - the same info from the previous comment, and I did that and the request just went through. So, problem resolved.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.