Calculate azure OpenAI RPM value

Rajnish Soni 60 Reputation points
2025-05-13T09:52:30.53+00:00

I want to apply rate limit via apim in my openai instances for which i need to calculate the rpm value programmatically. As far as i know the RPM value is different for each openAI model. Is there any way we can fetch and calculate the openAI RPM value for each model.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,081 questions
{count} votes

Accepted answer
  1. SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator
    2025-05-14T11:33:05.16+00:00

    Hi @Rajnish Soni,

    Thank you for the detailed follow-up your observation is valid, and I can clarify how this works.

    Azure OpenAI model deployments are provisioned based on Tokens Per Minute (TPM), and the Requests Per Minute (RPM) limit is derived from that based on the model’s characteristics. For the gpt-35-turbo-0125 model, the standard formula used by Azure is:

    RPM = (TPM × 6) / 1000
    

    In your scenario, you configured a capacity of 20, which equates to 20,000 TPM. Applying the formula:

    (20,000 × 6) / 1000 = 120 RPM
    

    This explains why the Azure Portal correctly shows the RPM as 120 for your deployment.

    Now regarding the x-ratelimit-limit-requests: 20 value you see in the cURL response this reflects the per-second request limit, not the per-minute limit. When multiplied by 60 seconds, this suggests a potential throughput of up to 1,200 requests per minute in terms of burst capability:

    20 requests/sec × 60 sec = 1200 requests/min
    

    However, Azure enforces both token-based and request-based rate limits, and the lower of the two becomes the effective cap. So even if the per-second limit appears to allow for 1,200 RPM, your deployment's configuration (based on 20,000 TPM) imposes a real RPM limit of 120, which is what ultimately throttles your traffic.

    Therefore, for configuring throttling in Azure API Management (APIM), you should rely on the RPM value shown in the Azure Portal or retrieved programmatically via the Azure Resource Manager (ARM) API using:

    GET /accounts/{account}/deployments/{name}?api-version=2024-10-01
    

    In the response, the properties.callRateLimit.count field will give you the actual RPM value for the deployment.

    I hope this helps. Do let me know if you have further queries.

    Thank you!


1 additional answer

Sort by: Most helpful
  1. Sina Salam 22,031 Reputation points Volunteer Moderator
    2025-05-13T13:49:39.3+00:00

    Hello Rajnish Soni,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you're looking for a method to calculate the rpm value programmatically.

    1. Yes, OpenAI provides rate limits per model in their official documentation, and these limits can vary depending on:
      • The model (e.g., gpt-4, gpt-3.5, dall-e, etc.)
      • The subscription tier (e.g., free, pay-as-you-go, or enterprise)
      • The endpoint (e.g., chat, completions, embeddings, image generation)
      For example (as of 2025): Model- gpt-4-turbo, RPM (Requests/Min) is 10,000 and TPM (Tokens/Min) is 1,000,000. You can read more on OpenAI Rate Limits Guide - https://platform.openai.com/docs/guides/rate-limits These values must be manually maintained in a config file or key-value store.
    2. In Azure APIM, you can extract the model name from the request body using a policy like XML:
         <set-variable name="modelName" value="@(context.Request.Body.As<JObject>()["model"]?.ToString())" />
      
    3. Use a lookup table in APIM policies or an external service (e.g., Azure Function or Redis) to map modelName to its RPM. For an example in APIM policy:
         <choose>
           <when condition="@(context.Variables["modelName"] == "gpt-4-turbo")">
             <set-variable name="rpmLimit" value="10000" />
           </when>
           <when condition="@(context.Variables["modelName"] == "gpt-3.5-turbo")">
             <set-variable name="rpmLimit" value="20000" />
           </when>
         </choose>
      
      1. Use the rate-limit-by-key policy with a composite key like subscriptionId + modelNameto apply rate limit in APIM:
         <rate-limit-by-key calls="@(context.Variables["rpmLimit"])" renewal-period="60" counter-key="@(context.Subscription.Id + '-' + context.Variables["modelName"])" />
      

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.