Hi Mahesh,
Thanks for raising this.
It's a common concern when working with larger datasets in Copilot Agents.
From our experience and discussions with the engineering team, it appears that you're currently using a declarative Copilot Studio connector, which may have certain platform-imposed constraints—such as the 2048 row retrieval limit you've encountered. Our team has not used Copilot Studio and have been using the TTK instead. Based on your scale needs, you may want to explore the pro-dev approach with TTK (Teams Toolkit) using Visual Studio Code. I have not seen any dataset row count restrictions since we own and control the API "plugin"
A Pro Dev Approach with API Plug-ins in M365 Declarative Agents
Going the pro-dev approach, our team integrated multiple coding techniques such as an API Plugin, Retrieval Augmented Generation (RAG), and traditional SQL Provider code. Each of these layers plays its role:
- Semantic Kernel: Natural Language Processing (NLP) for Query Understanding
- API Plugin & RAG: These allow for dynamic retrieval and enrichment of data. The API Plugin pulls in external data while RAG augments that data in real time, enabling our agent to deliver a polished, aggregated view.
- Traditional SQL Provider: This layer manages the data queries and aggregation, ensuring that even if there are no specific limitations on the number of rows, we maintain a level of performance that’s aligned with the expectations of a Teams Agent
Some Best Practices for Managing Dataset Sizes
Even though our systems haven't shown limitations regarding the number of rows, Teams Agents are designed to provide an aggregated snapshot rather than a deep dive into raw data. To align with this vision and ensure a responsive, user-centric experience, consider these best practices:
- Filter Early, Filter Often: Design queries to fetch only the subset of data that’s relevant to the aggregation view needed for the user question. Use SQL filtering techniques (WHERE clauses, parameterized queries) to eliminate unnecessary rows at the source.
- Implement Pagination & Lazy Loading: Even if a dataset is large, only a small “window” of data should be served at any one time. Incorporate pagination or lazy loading techniques to gradually load data, especially if the aggregated view might eventually need to drill down into specifics.
- Pre-Aggregate Data: Perform aggregation operations on the server side. Summarize or compute necessary metrics before sending the result to the Teams agent. This ensures that the UI displays digestible insights rather than raw rows.
- Indexing and Query Optimization: Optimize your SQL queries with the right indexes and query plans. This not only speeds up data retrieval but also minimizes the data footprint that’s required for the agent’s view.
- Caching Frequently Used Aggregations: If certain aggregated data is requested frequently, implement caching to avoid repeated heavy computation. By caching results, you reduce the load on the backend and improve responsiveness in the Agent.
- Limit Data Exposure in UI: Adopt a design philosophy where the Agent is seen as an "aggregation dashboard" rather than a data exploration tool. The UI should centralize on key metrics and trends, with options to drill down only when absolutely necessary. ur team has not used Copilot Studio and have been using the TTK instead. Based on your scale needs, you may want to explore the pro-dev approach with TTK (Teams Toolkit) using Visual Studio Code. I have not seen any dataset row count restrictions since we own and control the API "plugin".
Thanks,
Keshav Keshari
*************************************************************************
If the response is helpful, please click on "upvote" button. You can share your feedback via Microsoft Copilot Developer Feedback link. Click here to escalate.