Tokens per Minute Rate Limit

Kris B 0 Reputation points
2024-05-29T19:47:04.81+00:00

The training doesn't say what Tokens per Minute Rate Limit should be

This question is related to the following Learning Module

Azure Training
Azure Training
Azure: A cloud computing platform and infrastructure for building, deploying and managing applications and services through a worldwide network of Microsoft-managed datacenters.Training: Instruction to develop new skills.
1,313 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Mehmet Akar 80 Reputation points
    2024-05-29T20:13:59.44+00:00

    Are you sure? It gives all details: For example, if Azure OpenAI is monitoring request rate on 1-second intervals, then rate limiting will occur for a 600-RPM deployment if more than 10 requests are received during each 1-second period (600 requests per minute = 10 requests per second). Source: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/quota?tabs=rest

    0 comments No comments

  2. VarunTha 5,980 Reputation points Microsoft Vendor
    2024-05-30T02:50:15.99+00:00

    Hi Kris B,
    Thank you for reaching out to us on the Microsoft Q&A forum.

    I understand your concern regarding the Tokens per Minute (TPM) rate limit in the "Get started with Azure OpenAI Service" module. While this specific detail might not be covered in that unit, I can provide some clarification:

    1.TPM Quota: Your quota defines the maximum tokens per minute allowed for your resources in a specific region. For example, if you have a quota of 240,000 TPM for GPT-3.5 Turbo in East US, you can create deployments that consume a total of up to 240,000 TPM.

    2.TPM Rate Limit: This quota directly translates to your deployment's TPM rate limit. Essentially, you cannot use more tokens per minute than your quota allows. For instance, if your quota is set at 1000 TPM, this means your deployment is limited to consuming 1000 tokens per minute.

    There's an additional layer to consider:

    • Requests per Minute (RPM) Limit: Azure OpenAI Service also enforces an RPM (requests per minute) limit that is proportional to your TPM limit. The current ratio is 6 RPM per 1000 TPM. With a 1000 TPM limit, your deployment's RPM limit would be 6 requests per minute.

    Here are some resources that can be helpful:

    I hope this helps clarify the TPM rate limits for you. If you have any further questions or need additional assistance, please feel free to reach out.

    If you found this response helpful, please click "Accept Answer" and "Upvote" so that others in the community can easily find the solution. Your contribution is highly appreciated.