Hi @Filipa Castro,
Thank you for your question! The higher token usage you're seeing when using the Chat Playground with Azure OpenAI connected to your own data is expected, and here's why.
Even though your prompt is small, the interaction with Azure Search introduces extra tokens. When you send a query like "extract name of the customer," the system first retrieves 3 document chunks (based on your top_n_documents
setting) from your search index. Each chunk is about 256 tokens, adding to the overall prompt size.
After retrieving the chunks, the system combines your original query with these document tokens before sending everything to the GPT model. This concatenation explains why the number of tokens grows beyond what you initially expected.
As for the multiple requests, it’s likely due to the system handling different stages of document retrieval, processing, and ranking in separate steps. This can result in more requests and, as a result, more tokens being used.
To optimize token usage, you can reduce the chunk size or lower the number of documents retrieved. This can help lower the overall token count while still retrieving relevant information for the model to answer accurately.
If you have any further questions or need assistance with specific adjustments, please don’t hesitate to reach out! Thank you.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful.