High token consumption in Azure OpenAI With Your Data

Filipa Castro 0 Reputation points
2024-10-15T08:03:34.98+00:00

When using the Chat Playground in Azure OpenAI, connected with my own data (search index created from a small .pdf of 10 pages), the token consumption is ~4k prompt tokens even though:

  • I'm testing with no system prompt
  • my testing prompt is super small. Something like "extract name of the customer"
  • chunk size=256 and top_n_document=3
  • I'm using query_type="simple"

I've also tested to run the sample code provided by the playground

By checking the metrics of number of requests and prompts used, I also noticed that, for each question sent, there are actually 3 requests to Azure OpenAI.

For retrieving 3 chunks of size 256, plus a few tokens for the prompt, I would expect to use ~800 tokens, not 4k.

Is there any additional flow running under the hood which might be causing this?

Here the code:

completion = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[
                    {"role": "user", "content": "Extract name of the customer"},
                ],
                max_tokens=20,
                temperature=0.0,
                top_p=0.95,
                frequency_penalty=0,
                presence_penalty=0,
                stop=None,
                stream=False,
                extra_body={
                    "data_sources": [
                        {
                            "type": "azure_search",
                            "parameters": {
								
								"endpoint": "<endpoint>",
                                "index_name": "<index name>",
                                "query_type": "simple",
                                "in_scope": True,
                                "strictness": 3,
                                "top_n_documents": 3,
                                "authentication": {
                                    "type": "api_key",
                                    "key": "<api-key>",
                                },	
                            },
                        }
                    ]
                },
            )
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,258 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. santoshkc 9,565 Reputation points Microsoft Vendor
    2024-10-15T13:13:07.5966667+00:00

    Hi @Filipa Castro,

    Thank you for your question! The higher token usage you're seeing when using the Chat Playground with Azure OpenAI connected to your own data is expected, and here's why.

    Even though your prompt is small, the interaction with Azure Search introduces extra tokens. When you send a query like "extract name of the customer," the system first retrieves 3 document chunks (based on your top_n_documents setting) from your search index. Each chunk is about 256 tokens, adding to the overall prompt size.

    After retrieving the chunks, the system combines your original query with these document tokens before sending everything to the GPT model. This concatenation explains why the number of tokens grows beyond what you initially expected.

    As for the multiple requests, it’s likely due to the system handling different stages of document retrieval, processing, and ranking in separate steps. This can result in more requests and, as a result, more tokens being used.

    To optimize token usage, you can reduce the chunk size or lower the number of documents retrieved. This can help lower the overall token count while still retrieving relevant information for the model to answer accurately.

    If you have any further questions or need assistance with specific adjustments, please don’t hesitate to reach out! Thank you.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.