Difference between Tokens per Minute Rate Limit (thousands) & Rate limit (Tokens per minute)

Question

Difference between Tokens per Minute Rate Limit (thousands) & Rate limit (Tokens per minute)

Panda, Parimesh 0

In a GPT-Vision model version deployment in Switzerland North region, I noticed the following specification:

Tokens per Minute Rate Limit (thousands): 10
Rate limit (Tokens per minute): 30000

While I understand from this matrix that the default quota limit for gpt-4 (vision-preview)
GPT-4 Turbo with Vision is 30K TPM, I don't understand the specification of 'Tokens per Minute Rate Limit (thousands): 10'.

Please help me understand these specifications better.

1 answer

Your answer

Answer 1

Azar 29,520 MVP Volunteer Moderator

Hey Parimesh Panda

The specs you mentioned to rate limiting refers for API usage in the GPT model deployment. and lemme clear that up

Rate Limit (Tokens per minute): 30,000

This represents the maximum number of tokens that the model can process per minute. In your case, it's set to 30,000 tokens per minute for the GPT-4 Turbo with Vision.

Tokens per Minute Rate Limit (thousands): 10

This value, expressed in thousands, is an alternative representation of the rate limit. In this case, it's set to 10,000 tokens per minute (10 * 1,000).

So both specs convey the same information but are presented in different units . It's common to express large numbers in terms of thousands or millions for readability. So answering to your question for both the out come is the same and only the units is differently calculated

If this helps kindly accept the answer thanks much;

Panda, Parimesh 0 Reputation points

2024-01-28T12:24:21.68+00:00

While I understand your explanation that both specifications are different ways of expressing rate limits, my underlying doubt still remains - how come 30K TPM and 10K TPM represent the same information? I mean how is it practically possible that the rate limit for the same GPT-4 Turbo with Vision deployment to be able to have rate limit of processing 30K tokens per minute and 10K tokens per minute. The rate limit should ideally be either one of them.
Azar 29,520 Reputation points MVP Volunteer Moderator

2024-01-28T12:29:20.53+00:00
Hey again lemme try and clarify Rate Limit 30,000

This is the actual rate limit for the GPT-4 Turbo with Vision model, and it means that the model can process up to 30,000 tokens per minute.

Tokens per Minute Rate Limit (thousands): 10 , This, is not directly related to the rate limit. guess this is a separate specification.
Panda, Parimesh 0 Reputation points

2024-01-28T12:49:47.5033333+00:00

I understand now and I am clear about Rate Limit. I am still curious about what does 'Tokens per Minute Rate Limit (thousands): 10' really mean?
Azar 29,520 Reputation points MVP Volunteer Moderator

2024-01-28T14:27:18.95+00:00

Hey have a look at this doc https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/quota?tabs=rest
Azar 29,520 Reputation points MVP Volunteer Moderator

2024-01-30T14:19:31.4133333+00:00

Checking if this helped, if it did kindly accept the answer thanks much
Azar 29,520 Reputation points MVP Volunteer Moderator

2024-02-05T15:46:06.04+00:00

Checking if this helped, if it did kindly accept the answer thanks much

Share via

Difference between Tokens per Minute Rate Limit (thousands) & Rate limit (Tokens per minute)

1 answer

Your answer