Hi Rahul kumar
Welcome to Microsoft Q&A Forum, thank you for posting your query here!
Input token usage is very high because of internal operation file search to get accurate answers. It includes splitting complex queries to simpler ones, running both keyword and semantics search across vector store. how-it-works
You can try adjusting below params to reduce token usage.
1 the chunk size and overlap settings,
- max number of chunks that need to be added,
3, reducing the number of results returned, and
- optimizing your queries to be more concise by reducing the need to split the query again to simpler queries.
Detailed guide on chunking and overlapping can be found here chunking-examples
Sample code for adjusting max number of results -
assistant = client.beta.assistants.create(name="Financial Analyst Assistant", instructions="You are an expert financial analyst.",
model="gpt-4-turbo", tools= [{"type": "file_search", "file_search": {"max_num_results": 2}}])
Thank You.