Difference between Tokens per Minute Rate Limit (thousands) & Rate limit (Tokens per minute)

Panda, Parimesh 0 Reputation points
2024-01-28T11:51:38.6266667+00:00

In a GPT-Vision model version deployment in Switzerland North region, I noticed the following specification:

  • Tokens per Minute Rate Limit (thousands): 10
  • Rate limit (Tokens per minute): 30000

While I understand from this matrix that the default quota limit for gpt-4 (vision-preview)
GPT-4 Turbo with Vision is 30K TPM, I don't understand the specification of 'Tokens per Minute Rate Limit (thousands): 10'.

Please help me understand these specifications better.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,808 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Azar 26,585 Reputation points MVP
    2024-01-28T12:17:03.9533333+00:00

    Hey Parimesh Panda

    The specs you mentioned to rate limiting refers for API usage in the GPT model deployment. and lemme clear that up

    Rate Limit (Tokens per minute): 30,000

    • This represents the maximum number of tokens that the model can process per minute. In your case, it's set to 30,000 tokens per minute for the GPT-4 Turbo with Vision.

    Tokens per Minute Rate Limit (thousands): 10

    • This value, expressed in thousands, is an alternative representation of the rate limit. In this case, it's set to 10,000 tokens per minute (10 * 1,000).

    So both specs convey the same information but are presented in different units . It's common to express large numbers in terms of thousands or millions for readability. So answering to your question for both the out come is the same and only the units is differently calculated

    If this helps kindly accept the answer thanks much;


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.