What is the relation between TPM and x-ratelimit-remaining-tokens and prompt_tokens

Question

What is the relation between TPM and x-ratelimit-remaining-tokens and prompt_tokens

Nguyen Thuy, Lien 20

I'm using GPT-3.5-turbo, and the 'prompt_tokens' in the response correctly reflects my input prompt's length (around 1000 tokens). However, the 'x-ratelimit-remaining-tokens' is only reduced by 16 tokens. Is this expected behavior, and could you clarify how Azure OpenAI calculates token usage for rate limiting? My TPM setting is 120K.

Accepted answer

0 additional answers

Your answer

Answer 1

Thanks for the question, The ‘x-ratelimit-remaining-tokens’ header reflects the number of tokens remaining in your rate limit, not the number of tokens used in a specific request.

When you make a request to OpenAI’s API, the total number of tokens in your prompt, including both input and output tokens, is counted towards your rate limit. For example, if your prompt is 10 tokens and you ask for a maximum of 20 tokens in the response, you’ll be billed for 30 tokens.

However, the ‘x-ratelimit-remaining-tokens’ header doesn’t decrease by the exact number of tokens used in your request. Instead, it decreases based on your token per minute (TPM) limit. If your TPM limit is 120K, then every minute, 120K tokens are deducted from your ‘x-ratelimit-remaining-tokens’, regardless of how many tokens you actually used within that minute.

So, if you’re seeing that ‘x-ratelimit-remaining-tokens’ is only reduced by 16 tokens after a request, it’s likely because you made the request at the end of a minute, and only 16 tokens were deducted before your TPM limit reset for the next minute.

Share via

What is the relation between TPM and x-ratelimit-remaining-tokens and prompt_tokens

0 additional answers

Your answer