Calculate azure OpenAI RPM value

Question

Calculate azure OpenAI RPM value

Rajnish Soni 60

I want to apply rate limit via apim in my openai instances for which i need to calculate the rpm value programmatically. As far as i know the RPM value is different for each openAI model. Is there any way we can fetch and calculate the openAI RPM value for each model.

Michele Ariis 1,960 Reputation points MVP

2025-05-13T12:49:15.6133333+00:00

Hi, if you want to know the exact RPM value for each Azure OpenAI deployment, you have two easy ways:

ARM API call:

Do a GET on the deployment:

GET /accounts/{account}/deployments/{name}?api-version=2024-10-01

The value is in properties.callRateLimit.count - that is your RPM.

Look at the response of a normal call:

Every time you make a request to your OpenAI endpoint, Azure returns you the header:

x-ratelimit-limit-requests

That is the actual RPM at that moment.

If you really can't, as a general rule you use:

RPM = TPM x 6 ÷ 1000

But it is better to read the real value via API.

Then in APIM you use it in the rate-limit-by-key.
SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-05-13T16:31:45.4366667+00:00

Hello @Rajnish Soni,

To calculate the Requests Per Minute (RPM) for your Azure OpenAI models useful when applying rate limits via Azure API Management (APIM) you can retrieve this value programmatically using a few reliable methods. One approach is to use the Azure Resource Manager (ARM) API with a GET request to /accounts/{account}/deployments/{name}?api-version=2024-10-01, where the properties.callRateLimit.count field provides the RPM limit.

Alternatively, when you call an OpenAI endpoint, the response headers include x-ratelimit-limit-requests, which reflects the current RPM assigned to that deployment. If neither option is feasible, you can estimate RPM using the formula RPM = TPM × 6 ÷ 1000, though the multiplier may vary by model.

Once you have the RPM value, you can apply it to your APIM rate-limit policies (e.g., using <rate-limit-by-key>) to enforce traffic control per model deployment.

We are currently working on identifying the appropriate Azure OpenAI command to support this scenario and will provide an update once it becomes available.

please refer this Azure OpenAI Service quotas and limits.

I hope this helps. Do let me know if you have further queries.

Thank you!
Rajnish Soni 60 Reputation points

2025-05-14T05:26:52.2566667+00:00
@SriLakshmi C Thanks for the quick ans. i have a doubt.

If i do the curl call to the openAI endpoint, i get the below value. x-ratelimit-limit-requests: 20

If i login to azure portal and check the rate limit, it shows the RPM value as 120.

FYI. I have created a gpt-35-turbo, 0125 version. with capacity of 20. so if i got by the formula then 20*1000 = 20000 TPM * 6 / 1000 = 120.

I am confused. which is correct.
SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-05-15T10:01:09.5966667+00:00

Hi @Rajnish Soni,

Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.

Thank you!
Rajnish Soni 60 Reputation points

2025-05-20T03:39:01.78+00:00

Thanks @SriLakshmi C for the clarification. this clear all the doubt i have regarding RPM. thanks for the help.

Accepted answer

1 additional answer

Your answer

Michele Ariis 1,960 Reputation points MVP

2025-05-13T12:49:15.6133333+00:00

Hi, if you want to know the exact RPM value for each Azure OpenAI deployment, you have two easy ways:

ARM API call:

Do a GET on the deployment:

GET /accounts/{account}/deployments/{name}?api-version=2024-10-01

The value is in properties.callRateLimit.count - that is your RPM.

Look at the response of a normal call:

Every time you make a request to your OpenAI endpoint, Azure returns you the header:

x-ratelimit-limit-requests

That is the actual RPM at that moment.

If you really can't, as a general rule you use:

RPM = TPM x 6 ÷ 1000

But it is better to read the real value via API.

Then in APIM you use it in the rate-limit-by-key.
SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-05-13T16:31:45.4366667+00:00

Hello @Rajnish Soni,

To calculate the Requests Per Minute (RPM) for your Azure OpenAI models useful when applying rate limits via Azure API Management (APIM) you can retrieve this value programmatically using a few reliable methods. One approach is to use the Azure Resource Manager (ARM) API with a GET request to /accounts/{account}/deployments/{name}?api-version=2024-10-01, where the properties.callRateLimit.count field provides the RPM limit.

Alternatively, when you call an OpenAI endpoint, the response headers include x-ratelimit-limit-requests, which reflects the current RPM assigned to that deployment. If neither option is feasible, you can estimate RPM using the formula RPM = TPM × 6 ÷ 1000, though the multiplier may vary by model.

Once you have the RPM value, you can apply it to your APIM rate-limit policies (e.g., using <rate-limit-by-key>) to enforce traffic control per model deployment.

We are currently working on identifying the appropriate Azure OpenAI command to support this scenario and will provide an update once it becomes available.

please refer this Azure OpenAI Service quotas and limits.

I hope this helps. Do let me know if you have further queries.

Thank you!
Rajnish Soni 60 Reputation points

2025-05-14T05:26:52.2566667+00:00

@SriLakshmi C Thanks for the quick ans. i have a doubt.

If i do the curl call to the openAI endpoint, i get the below value. x-ratelimit-limit-requests: 20

If i login to azure portal and check the rate limit, it shows the RPM value as 120.

FYI. I have created a gpt-35-turbo, 0125 version. with capacity of 20. so if i got by the formula then 20*1000 = 20000 TPM * 6 / 1000 = 120.

I am confused. which is correct.
SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-05-15T10:01:09.5966667+00:00

Hi @Rajnish Soni,

Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.

Thank you!
Rajnish Soni 60 Reputation points

2025-05-20T03:39:01.78+00:00

Thanks @SriLakshmi C for the clarification. this clear all the doubt i have regarding RPM. thanks for the help.

Answer 1

Hi @Rajnish Soni,

Thank you for the detailed follow-up your observation is valid, and I can clarify how this works.

Azure OpenAI model deployments are provisioned based on Tokens Per Minute (TPM), and the Requests Per Minute (RPM) limit is derived from that based on the model’s characteristics. For the gpt-35-turbo-0125 model, the standard formula used by Azure is:

RPM = (TPM × 6) / 1000

In your scenario, you configured a capacity of 20, which equates to 20,000 TPM. Applying the formula:

(20,000 × 6) / 1000 = 120 RPM

This explains why the Azure Portal correctly shows the RPM as 120 for your deployment.

Now regarding the x-ratelimit-limit-requests: 20 value you see in the cURL response this reflects the per-second request limit, not the per-minute limit. When multiplied by 60 seconds, this suggests a potential throughput of up to 1,200 requests per minute in terms of burst capability:

20 requests/sec × 60 sec = 1200 requests/min

However, Azure enforces both token-based and request-based rate limits, and the lower of the two becomes the effective cap. So even if the per-second limit appears to allow for 1,200 RPM, your deployment's configuration (based on 20,000 TPM) imposes a real RPM limit of 120, which is what ultimately throttles your traffic.

Therefore, for configuring throttling in Azure API Management (APIM), you should rely on the RPM value shown in the Azure Portal or retrieved programmatically via the Azure Resource Manager (ARM) API using:

GET /accounts/{account}/deployments/{name}?api-version=2024-10-01

In the response, the properties.callRateLimit.count field will give you the actual RPM value for the deployment.

I hope this helps. Do let me know if you have further queries.

Thank you!

Rajnish Soni 60 Reputation points

2025-05-20T03:39:34.33+00:00

Thanks @SriLakshmi C for the clarification. this clear all the doubt i have regarding RPM. thanks for the help.

Answer 2

Hello Rajnish Soni,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you're looking for a method to calculate the rpm value programmatically.

Yes, OpenAI provides rate limits per model in their official documentation, and these limits can vary depending on:
- The model (e.g., gpt-4, gpt-3.5, dall-e, etc.)
- The subscription tier (e.g., free, pay-as-you-go, or enterprise)
- The endpoint (e.g., chat, completions, embeddings, image generation)
For example (as of 2025): Model- gpt-4-turbo, RPM (Requests/Min) is 10,000 and TPM (Tokens/Min) is 1,000,000. You can read more on OpenAI Rate Limits Guide - https://platform.openai.com/docs/guides/rate-limits These values must be manually maintained in a config file or key-value store.

In Azure APIM, you can extract the model name from the request body using a policy like XML:

   <set-variable name="modelName" value="@(context.Request.Body.As<JObject>()["model"]?.ToString())" />

Use a lookup table in APIM policies or an external service (e.g., Azure Function or Redis) to map modelName to its RPM. For an example in APIM policy:

   <choose>
     <when condition="@(context.Variables["modelName"] == "gpt-4-turbo")">
       <set-variable name="rpmLimit" value="10000" />
     </when>
     <when condition="@(context.Variables["modelName"] == "gpt-3.5-turbo")">
       <set-variable name="rpmLimit" value="20000" />
     </when>
   </choose>

Use the rate-limit-by-key policy with a composite key like subscriptionId + modelNameto apply rate limit in APIM:

   <rate-limit-by-key calls="@(context.Variables["rpmLimit"])" renewal-period="60" counter-key="@(context.Subscription.Id + '-' + context.Variables["modelName"])" />

I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Share via

Calculate azure OpenAI RPM value

1 additional answer

Your answer