hi vishav!
thanks for throwing this question out here, super detailed and super helpful for others wrestling with the same stuff ))
alright, let’s break it down. u’re hitting some real pain points with azure openai assistants and agents, especially around scaling and accuracy.
file size & token limits
yeah, the file size thing is a headache. right now, assistants choke if docs are too big or messy. u’re already doing the smart thing with custom truncation, pre-split ur docs into smaller chunks before ingestion. use something like the text splitter
in langchain (or a simple python script) to break ‘em down by paragraphs or sections. that way, u avoid the line-level chaos.
vector store capacity
10k files per store is tight, no lie. for now, u gotta juggle multiple stores if u’re over that limit. but! u can automate the linking part with the api spin up new stores dynamically and attach ‘em as needed. it’s clunky, but it works. microsoft’s working on scaling this (fingers crossed), but no ETA yet.
agent going rogue
ugh, the agent pulling answers outta thin air is frustrating. to lock it down, u gotta hammer the instructions. like, really specific. try something like: “only use info from the linked index. if it’s not there, say ‘i don’t know’.” also, check the strict_mode
flag in the agent config it’s in preview, but might help.
llm context limits
this one’s a beast. even with RAG, the llm’s attention span is… short. to squeeze more in, try tweaking the chunk_size
and overlap
in ur index settings. smaller chunks + overlap can help it “see” more connections. for summarization across tons of docs, u might need a hybrid approach: first, use cognitive search to pull the top relevant chunks, then feed those into the llm with a prompt like “summarize these records, not ur general knowledge.”
upcoming fixes?
microsoft’s been pretty hush-hush, but the agent/assistant stuff is evolving fast. check the azure updates blog they drop surprises there.
hang in there! u’re already ahead of the curve by mixing assistants and agents. if u nail the pre-processing and instructions, u can brute-force ur way to something workable for now....
rgds,
Alex