The gap between updating files in Azure Blob Storage and the model being able to use that information follows a carefully orchestrated process. Initially, there's a mandatory 60-second wait period during thread creation to ensure files begin processing before the agent starts using them. During this time, the system automatically handles three main stages: document parsing and chunking (with 800 tokens per chunk), embedding generation using text-embedding-3-large
, and vector store indexing.
The Vector Store acts as an intermediary between Blob Storage and the Index Service, handling the chunking and embedding process automatically. Once processing begins, index creation happens in the background while queries continue to function. There's no explicit downtime during updates, as the system maintains availability throughout the process. The system intelligently balances ongoing queries with indexing jobs to maintain optimal performance.
After initial processing completes, the index continues to update automatically, allowing queries to execute continuously during indexing operations. While background deletion processes run periodically to free space, you can monitor the status using SDK polling helpers to verify content availability. The entire process is automated—you don't need to trigger any manual updates. The system handles all synchronization between Azure Blob Storage, vector stores, and the search index automatically, ensuring continuous availability of your agent while maintaining data freshness.
Reference:
Managing Concurrency in Blob Storage
Blob Versioning in Azure Storage
Thanks