An application or feature powered by Windows Copilot to assist users with tasks and productivity
Copilot Studio knowledge sources are indexed asynchronously and can remain in an “In progress” state for an extended period while content is being crawled, chunked, and vector-indexed in Dataverse. This behavior is expected, especially for larger or changing data sets, because:
- Files and external content are ingested, chunked, and converted into semantic indexes and vector embeddings before they can be used for grounding responses.
- SharePoint, OneDrive, and other unstructured sources are not indexed in real time; there can be noticeable delay between upload/changes and when the agent can fully use them.
- For SharePoint-based knowledge, content is cached; removing or changing files does not immediately update what the agent “sees.” There is no way to force immediate re-indexing or purge the cache per session.
If knowledge items appear stuck “In progress,” the practical implications are:
- The agent might still only use a partial extract of the data until indexing completes. For unpublished agents, only a limited subset of the data may be processed.
- For dynamic or per-session content, Copilot Studio knowledge ingestion is not suitable because indexing and cache invalidation are not instantaneous.
Recommended actions based on the documented behavior:
- Allow time for indexing
- For websites and other unstructured data, Copilot Studio refreshes and re-indexes content periodically (for websites, typically within 24 hours). A persistent “In progress” state shortly after adding or updating a source can be normal during this window.
- Publish the agent after configuring knowledge
- Ensure the agent is published after adding or updating knowledge sources. Unpublished agents may only process a small extract of the data, which can look like incomplete or stuck processing.
- Avoid per-session or rapidly changing files as knowledge
- Knowledge ingestion is optimized for static or semi-static content. For scenarios where files must be uploaded and removed per session, and answers must only use that session’s files, Copilot Studio knowledge sources are not appropriate because:
- Indexing is asynchronous and not real time.
- Cached content from previous files can still be used even after deletion.
- Knowledge ingestion is optimized for static or semi-static content. For scenarios where files must be uploaded and removed per session, and answers must only use that session’s files, Copilot Studio knowledge sources are not appropriate because:
- Use alternative patterns for dynamic data
- For highly dynamic or per-session data, consider:
- Using real-time connectors or tools (for example, Power Platform connectors) to fetch data at query time instead of indexing it as knowledge.
- Designing the solution so that knowledge sources are relatively stable, and dynamic aspects are handled via tools/actions rather than uploaded knowledge.
- For highly dynamic or per-session data, consider:
If the “In progress” status persists well beyond expected indexing windows (for example, more than a day for static content) and the agent remains unable to use the knowledge, raise a support ticket with full details (tenant, environment, type and size of knowledge source, and timestamps) so the service team can investigate backend indexing issues.
References:
- Unstructured data as a knowledge source
- Copilot connectors versus Power Platform connectors as knowledge sources
- Copilot Studio Agent will not read entire data set in Excel or markdown - Microsoft Q&A
- Hi, Is there a way to add and remove files into the knowledge source during run time? - Microsoft Q&A
- i have build an Tariff agent using copilot studio, websites as knowledge source https://dataweb.usitc.gov/ etc.., if any new revisions released on website then is it automatically refreshed on agent? or i need to publish for that? please tell me - Microsoft Q&A