How can I accurately count tokens used in OpenAI services?

Simone Gallo 0 Reputation points
2023-11-13T15:35:54.71+00:00

Hello,

I am having trouble understanding how tokens are actually counted when using the "function calling" feature in OpenAI services.

In the response received after calling the OpenAI service, "total tokens" (including those in the prompt and completion) are reported under the "usage" property. However, when I check the tokens used (and corresponding expenses) on the Azure monitoring page, a much higher number of tokens is displayed.

For example, the API response JSON reports 5k tokens, while the monitoring page shows about 21k tokens.

It seems that this discrepancy only occurs when the model decides to call functions. Is there a way to get real-time tracking of the tokens actually used?

Thank you.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,081 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Pramod Valavala 20,656 Reputation points Microsoft Employee Moderator
    2023-11-13T18:47:41.89+00:00

    @Simone Gallo Function calling specifically, since there is an intermediate step that happens on the service side to suggest functions to call, there are more tokens processed than your input prompt.

    Unfortunately, this is not something that is documented at the moment considering the intermediate prompt is part of the service itself. There is an open discussion about this on the OpenAI Forums as well, which does include some third-party libraries that have approximated these extra tokens through multiple trials.

    As for the discrepancy itself, this is something that I haven't observed myself. The metrics reported were the exact numbers that I see in the API response. It would be best to ensure you don't have others making calls on the same instance and compare the exact metric data that is incorrect (Processed Prompt Tokens -> prompt_tokens; Generated Completion Tokens -> completion_tokens; Processed Inference Tokens -> total_tokens).

    If you consistently see incorrect values with you being the only owner, it would be best to open a support ticket to investigate this further.

    0 comments No comments

  2. Simone Gallo 0 Reputation points
    2023-11-15T19:02:14.8466667+00:00

    @Pramod Valavala Thank you for your response, which was helpful, but it doesn't address my question. Unfortunately, I just realized I omitted a crucial detail: the model is used in conversations, so the tokens from previous calls need to be added to the tokens of the new call (as better explained in this post: https://community.openai.com/t/how-can-we-count-the-used-tokens-in-a-conversation/213389).

    I resolved the issue by implementing a couple of Python functions that utilize tiktoken.

    These are my results from some tests:

    Prompt tokens: 31.18k (Azure) // 31.199 (script)

    Completion tokens: 1.12k // 1.112

    Total tokens: 32.30k // 32.31k

    import tiktoken
    def tokens_count_for_message(message, encoding):
        """Return the number of tokens used by a single message."""
        tokens_per_message = 3
    
        num_tokens = 0
        num_tokens += tokens_per_message
        for key, value in message.items():
            if key == "function_call":
                num_tokens += len(encoding.encode(value["name"]))
                num_tokens += len(encoding.encode(value["arguments"]))
            else:
                if key == 'content' or key == 'name':
                    num_tokens += len(encoding.encode(value))
               
        return num_tokens
    
    def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
        """Return the number of tokens used by a list of messages for both user and assistant."""
        try:
            encoding = tiktoken.encoding_for_model(model)
        except KeyError:
            print("Warning: model not found. Using cl100k_base encoding.")
            encoding = tiktoken.get_encoding("cl100k_base")
    
        user_tokens = 0
        assistant_tokens = 0
        for i, message in enumerate(messages):
            # Check if the current message involves a service call
            is_service_call = "assistant" in messages[i]['role']
    
            # Include tokens from previous messages only when a service call is made
            if is_service_call:
                assistant_tokens += tokens_count_for_message(message, encoding)
                for j in range(i):
                    user_tokens += tokens_count_for_message(messages[j], encoding)
    
            # Count tokens for the current message
            user_tokens += tokens_count_for_message(message, encoding)
    
        assistant_tokens += 3  # every reply is primed with assistant
        
        return user_tokens, assistant_tokens, user_tokens+assistant_tokens
    
    print(num_tokens_from_messages(messages list, model name))
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.