Hey Parimesh Panda
The specs you mentioned to rate limiting refers for API usage in the GPT model deployment. and lemme clear that up
Rate Limit (Tokens per minute): 30,000
- This represents the maximum number of tokens that the model can process per minute. In your case, it's set to 30,000 tokens per minute for the GPT-4 Turbo with Vision.
Tokens per Minute Rate Limit (thousands): 10
- This value, expressed in thousands, is an alternative representation of the rate limit. In this case, it's set to 10,000 tokens per minute (10 * 1,000).
So both specs convey the same information but are presented in different units . It's common to express large numbers in terms of thousands or millions for readability. So answering to your question for both the out come is the same and only the units is differently calculated
If this helps kindly accept the answer thanks much;