Thanks for the question, The ‘x-ratelimit-remaining-tokens’ header reflects the number of tokens remaining in your rate limit, not the number of tokens used in a specific request.
When you make a request to OpenAI’s API, the total number of tokens in your prompt, including both input and output tokens, is counted towards your rate limit. For example, if your prompt is 10 tokens and you ask for a maximum of 20 tokens in the response, you’ll be billed for 30 tokens.
However, the ‘x-ratelimit-remaining-tokens’ header doesn’t decrease by the exact number of tokens used in your request. Instead, it decreases based on your token per minute (TPM) limit. If your TPM limit is 120K, then every minute, 120K tokens are deducted from your ‘x-ratelimit-remaining-tokens’, regardless of how many tokens you actually used within that minute.
So, if you’re seeing that ‘x-ratelimit-remaining-tokens’ is only reduced by 16 tokens after a request, it’s likely because you made the request at the end of a minute, and only 16 tokens were deducted before your TPM limit reset for the next minute.