What is the relation between TPM and x-ratelimit-remaining-tokens and prompt_tokens

Nguyen Thuy, Lien 20 Reputation points
2024-02-01T08:58:59.5333333+00:00

I'm using GPT-3.5-turbo, and the 'prompt_tokens' in the response correctly reflects my input prompt's length (around 1000 tokens). However, the 'x-ratelimit-remaining-tokens' is only reduced by 16 tokens. Is this expected behavior, and could you clarify how Azure OpenAI calculates token usage for rate limiting? My TPM setting is 120K.

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,632 questions
0 comments No comments
{count} votes

Accepted answer
  1. Ramr-msft 17,826 Reputation points
    2024-02-05T05:49:50.4066667+00:00

    Thanks for the question, The ‘x-ratelimit-remaining-tokens’ header reflects the number of tokens remaining in your rate limit, not the number of tokens used in a specific request.

    When you make a request to OpenAI’s API, the total number of tokens in your prompt, including both input and output tokens, is counted towards your rate limit. For example, if your prompt is 10 tokens and you ask for a maximum of 20 tokens in the response, you’ll be billed for 30 tokens.

    However, the ‘x-ratelimit-remaining-tokens’ header doesn’t decrease by the exact number of tokens used in your request. Instead, it decreases based on your token per minute (TPM) limit. If your TPM limit is 120K, then every minute, 120K tokens are deducted from your ‘x-ratelimit-remaining-tokens’, regardless of how many tokens you actually used within that minute.

    So, if you’re seeing that ‘x-ratelimit-remaining-tokens’ is only reduced by 16 tokens after a request, it’s likely because you made the request at the end of a minute, and only 16 tokens were deducted before your TPM limit reset for the next minute.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.