I have an AI model deployed in Azure AI Foundry. When I call it via the API, I get 'TooManyRequests' after a couple of requests.

Question

I have an AI model deployed in Azure AI Foundry. When I call it via the API, I get 'TooManyRequests' after a couple of requests.

Stephen 85

In Azure AI Foundry, I have the gpt-4o model deployed. In the UI, it is grouped under the Azure AI service “ai-sig6-azure-ai-services_aoai”. In the Azure Portal, I have an Azure AI Service called ai-sig6-azure-ai-services. The gpt-4o model has TKM of 30K and RPM of 180. I try to send several requests in a row and 1 or 2 will succeed and then I get the error HTTP Status Code ‘TooManyRequests’. I should not be anywhere close to those limits. I think there must be another limit that I am hitting, but cannot find it in the Azure Portal or Azure AI Foundry.

The http headers when I get the ‘TooManyRequests’ are:

Here are the response headers:

Retry-After: 49

x-ratelimit-reset-tokens: 49

apim-request-id: 8ef18262-d6c3-4b3b-a2bf-7cf1ccdddfee

Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

X-Content-Type-Options: nosniff

policy-id: DeploymentRatelimit-Token

x-ms-region: East US 2

x-ratelimit-remaining-requests: 24

Date: Wed, 12 Feb 2025 14:14:46 GMT

Request failed with status code: TooManyRequests

What do I need to change so I don’t get this error?

kothapally Snigdha 3,020 Reputation points Microsoft External Staff Moderator

2025-02-12T22:39:16.49+00:00

Hi Stephen,

Could you kindly provide the subscription details as requested in the Private feature?

Thank you!
Stephen 85 Reputation points

2025-02-13T16:15:58.0833333+00:00

I am still having issues with the rate limits.

I have the gpt-4o model deployed in Azure AI Foundry. The Deployment Type is Global Standard. The Tokens per minute Rate Limit is 30,000. The Request per minute Rate Limit is 180. When I click edit it says the Connected Azure OpenAI Service resource is ai-sig6-azure-ai-services_aoai.

At https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits#quotas-and-limits-reference it says for the gpt-4o model on the Default Tier it should be 450K for TPM and 2.7K for RPM.

In the Azure AI Foundry, the maximum TPM is 30K (not 450K) and the maximum RPM is 180 (not 2.7K).

So, I would assume the Connected Azure OpenAI Service resource, ai-sig6-azure-ai-services_aoai, is what the rate limits are being assigned to. When I go to the Azure Portal >> Azure AI services >> Azure OpenAI, there is nothing listed. How do I find where the quotas are set for the Azure OpenAI Service resource “ai-sig6-azure-ai-services_aoai”?

How can I get the rate limits set to what it says at https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits#quotas-and-limits-reference or higher for my gpt-4o model?
Stephen 85 Reputation points

2025-02-13T16:20:39.45+00:00

I accidently posted an answer instead of a comment.

I am still having issues with the rate limits.

I have the gpt-4o model deployed in Azure AI Foundry. The Deployment Type is Global Standard. The Tokens per minute Rate Limit is 30,000. The Request per minute Rate Limit is 180. When I click edit it says the Connected Azure OpenAI Service resource is ai-sig6-azure-ai-services_aoai.

At https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits#quotas-and-limits-reference it says for the gpt-4o model on the Default Tier it should be 450K for TPM and 2.7K for RPM.

In the Azure AI Foundry, the maximum TPM is 30K (not 450K) and the maximum RPM is 180 (not 2.7K).

So, I would assume the Connected Azure OpenAI Service resource, ai-sig6-azure-ai-services_aoai, is what the rate limits are being assigned to. When I go to the Azure Portal >> Azure AI services >> Azure OpenAI, there is nothing listed. How do I find where the quotas are set for the Azure OpenAI Service resource “ai-sig6-azure-ai-services_aoai”?

How can I get the rate limits set to what it says at https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits#quotas-and-limits-reference or higher for my gpt-4o model?
Stephen 85 Reputation points

2025-02-13T16:30:16.1566667+00:00

I am still having issues with the rate limits.

I have the gpt-4o model deployed in Azure AI Foundry. The Deployment Type is Global Standard. The Tokens per minute Rate Limit is 30,000. The Request per minute Rate Limit is 180. When I click edit it says the Connected Azure OpenAI Service resource is ai-sig6-azure-ai-services_aoai.

At https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits#quotas-and-limits-reference it says for the gpt-4o model on the Default Tier it should be 450K for TPM and 2.7K for RPM.

In the Azure AI Foundry, the maximum TPM is 30K (not 450K) and the maximum RPM is 180 (not 2.7K).

So, I would assume the Connected Azure OpenAI Service resource, ai-sig6-azure-ai-services_aoai, is what the rate limits are being assigned to. When I go to the Azure Portal >> Azure AI services >> Azure OpenAI, there is nothing listed. How do I find where the quotas are set for the Azure OpenAI Service resource “ai-sig6-azure-ai-services_aoai”?

How can I get the rate limits set to what it says at https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits#quotas-and-limits-reference or higher for my gpt-4o model?
Stephen 85 Reputation points

2025-02-13T19:20:18.1833333+00:00

I would like to verify the quotas assigned to the connected resource. However, when I go to the Azure Portal >> Azure AI services >> Azure OpenAI, there is nothing listed. Perhaps it is because it was created from Azure AI Foundry - not sure?
SriLakshmi C 6,250 Reputation points Microsoft External Staff Moderator

2025-02-14T01:23:24.0866667+00:00
Hi Stephen,

To verify the quotas assigned to your Azure OpenAI model deployed in Azure AI Foundry, you can check the quotas directly in the Azure AI Foundry portal. Follow these steps:

Go to the Azure AI Foundry portal.

Select Management from the left menu.

Click on Quota to view your quota allocations across deployments in the selected region.

If you do not see any quotas listed, it may be due to the model being created in Azure AI Foundry, as the quotas for Azure OpenAI models are managed within that portal rather than the general Azure Portal.

If you continue to experience issues or need to request an increase in your quota, you can submit a quota increase request through the appropriate form provided in the portal.

Please refer this View and request quotas in Azure AI Foundry portal and View and request quota.

Thank you!

Accepted answer

3 additional answers

Your answer

kothapally Snigdha 3,020 Reputation points Microsoft External Staff Moderator

2025-02-12T22:39:16.49+00:00

Hi Stephen,

Could you kindly provide the subscription details as requested in the Private feature?

Thank you!
Stephen 85 Reputation points

2025-02-13T16:15:58.0833333+00:00

I am still having issues with the rate limits.

I have the gpt-4o model deployed in Azure AI Foundry. The Deployment Type is Global Standard. The Tokens per minute Rate Limit is 30,000. The Request per minute Rate Limit is 180. When I click edit it says the Connected Azure OpenAI Service resource is ai-sig6-azure-ai-services_aoai.

At https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits#quotas-and-limits-reference it says for the gpt-4o model on the Default Tier it should be 450K for TPM and 2.7K for RPM.

In the Azure AI Foundry, the maximum TPM is 30K (not 450K) and the maximum RPM is 180 (not 2.7K).

So, I would assume the Connected Azure OpenAI Service resource, ai-sig6-azure-ai-services_aoai, is what the rate limits are being assigned to. When I go to the Azure Portal >> Azure AI services >> Azure OpenAI, there is nothing listed. How do I find where the quotas are set for the Azure OpenAI Service resource “ai-sig6-azure-ai-services_aoai”?

How can I get the rate limits set to what it says at https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits#quotas-and-limits-reference or higher for my gpt-4o model?
Stephen 85 Reputation points

2025-02-13T16:20:39.45+00:00

I accidently posted an answer instead of a comment.

I am still having issues with the rate limits.

I have the gpt-4o model deployed in Azure AI Foundry. The Deployment Type is Global Standard. The Tokens per minute Rate Limit is 30,000. The Request per minute Rate Limit is 180. When I click edit it says the Connected Azure OpenAI Service resource is ai-sig6-azure-ai-services_aoai.

At https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits#quotas-and-limits-reference it says for the gpt-4o model on the Default Tier it should be 450K for TPM and 2.7K for RPM.

In the Azure AI Foundry, the maximum TPM is 30K (not 450K) and the maximum RPM is 180 (not 2.7K).

So, I would assume the Connected Azure OpenAI Service resource, ai-sig6-azure-ai-services_aoai, is what the rate limits are being assigned to. When I go to the Azure Portal >> Azure AI services >> Azure OpenAI, there is nothing listed. How do I find where the quotas are set for the Azure OpenAI Service resource “ai-sig6-azure-ai-services_aoai”?

How can I get the rate limits set to what it says at https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits#quotas-and-limits-reference or higher for my gpt-4o model?
Stephen 85 Reputation points

2025-02-13T16:30:16.1566667+00:00

I am still having issues with the rate limits.

I have the gpt-4o model deployed in Azure AI Foundry. The Deployment Type is Global Standard. The Tokens per minute Rate Limit is 30,000. The Request per minute Rate Limit is 180. When I click edit it says the Connected Azure OpenAI Service resource is ai-sig6-azure-ai-services_aoai.

At https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits#quotas-and-limits-reference it says for the gpt-4o model on the Default Tier it should be 450K for TPM and 2.7K for RPM.

In the Azure AI Foundry, the maximum TPM is 30K (not 450K) and the maximum RPM is 180 (not 2.7K).

So, I would assume the Connected Azure OpenAI Service resource, ai-sig6-azure-ai-services_aoai, is what the rate limits are being assigned to. When I go to the Azure Portal >> Azure AI services >> Azure OpenAI, there is nothing listed. How do I find where the quotas are set for the Azure OpenAI Service resource “ai-sig6-azure-ai-services_aoai”?

How can I get the rate limits set to what it says at https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits#quotas-and-limits-reference or higher for my gpt-4o model?
Stephen 85 Reputation points

2025-02-13T19:20:18.1833333+00:00

I would like to verify the quotas assigned to the connected resource. However, when I go to the Azure Portal >> Azure AI services >> Azure OpenAI, there is nothing listed. Perhaps it is because it was created from Azure AI Foundry - not sure?
SriLakshmi C 6,250 Reputation points Microsoft External Staff Moderator

2025-02-14T01:23:24.0866667+00:00

Hi Stephen,

To verify the quotas assigned to your Azure OpenAI model deployed in Azure AI Foundry, you can check the quotas directly in the Azure AI Foundry portal. Follow these steps:

Go to the Azure AI Foundry portal.

Select Management from the left menu.

Click on Quota to view your quota allocations across deployments in the selected region.

If you do not see any quotas listed, it may be due to the model being created in Azure AI Foundry, as the quotas for Azure OpenAI models are managed within that portal rather than the general Azure Portal.

If you continue to experience issues or need to request an increase in your quota, you can submit a quota increase request through the appropriate form provided in the portal.

Please refer this View and request quotas in Azure AI Foundry portal and View and request quota.

Thank you!

Answer 1

Hi Stephen,

I understand that you're experiencing a rate limit issues with your gpt-4o model deployed in Azure AI Foundry. The rate limits you are encountering (30K TPM and 180 RPM) are indeed lower than the limits specified for the Default Tier (450K TPM and 2.7K RPM).

The rate limits you see are likely tied to the specific Azure OpenAI Service resource you are using. Since you mentioned that the connected resource is ai-sig6-azure-ai-services_aoai, you should verify the quotas assigned to this resource.
In the Azure Portal, navigate to the Azure OpenAI section and check if there are any quotas or limits set specifically for this resource. If nothing appears, it may indicate that the resource is not configured to utilize the higher limits available for the gpt-4o model.
If you want to increase your rate limits to match those specified for the Default Tier, you may need to submit a quota increase request. This can be done through the quota increase request form. Keep in mind that priority is given to customers who generate traffic that consumes existing quota allocations.
Request for Quota Increase https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR4xPXO648sJKt4GoXAed-0pUMFE1Rk9CU084RjA0TUlVSUlMWEQzVkJDNCQlQCN0PWcu

I hope these helps you. Thank you!

Answer 2

Ketsha 325 Microsoft Employee

Hi - Please use the following table and refer to the link below since this will help with the Quotas and limits that you are hitting.

User's image

enter image description here

Here is the quota and limits information: https://learn.microsoft.com/en-us/azure/cognitive-services/openai/quotas-limits#quotas-and-limits-reference

Also, GPT-4 model details: https://learn.microsoft.com/en-us/azure/cognitive-services/openai/concepts/models#gpt-4-models

Answer 3

Stephen 85

I created a support ticket and spoke to a Microsoft employee. Apparently, the region I deployed to was experiencing heavy use and therefore, the default TKM and RPM were lowered. She requested that I deploy to another region and provided some regions with more capacity. In addition, she said to request for quota increase - the same info from the previous comment, and I did that and the request just went through. So, problem resolved.

SriLakshmi C 6,250 Reputation points Microsoft External Staff Moderator

2025-02-17T01:22:49.1233333+00:00

Hi Stephen,

Glad to know that your issue has been resolved. And thanks for sharing the solution, which might be beneficial to other community members reading this thread.

Answer 4

Stephen 85

I created a support ticket and spoke to a Microsoft employee. Apparently, the region I deployed to was experiencing heavy use and therefore, the default TKM and RPM were lowered. She requested that I deploy to another region and provided some regions with more capacity. In addition, she said to request for quota increase - the same info from the previous comment, and I did that and the request just went through. So, problem resolved.

Share via

I have an AI model deployed in Azure AI Foundry. When I call it via the API, I get 'TooManyRequests' after a couple of requests.

3 additional answers

Your answer