Persistent Limitations in Azure OpenAI for Multi-Document RAG Analysis — Follow-Up on Assistant & Agent Recommendations

Question

Persistent Limitations in Azure OpenAI for Multi-Document RAG Analysis — Follow-Up on Assistant & Agent Recommendations

Vishav Singh | CHECKER 20

**Hello MS support team,

Issue Summary:** We were building a large-scale feedback analysis solution using Azure OpenAI + Cognitive Search (RAG architecture) to extract insights from ~10,000 structured survey documents.

While MS Support team previous response recommended migrating from Chat Completion RAG to Azure OpenAI Assistant, and more recently to AI Agent, we continue to face critical limitations that affect accuracy, reliability, and scalability.

What We’ve Implemented Based on Your Recommendations

Azure OpenAI Assistant We tested Assistant-based retrieval and agree it delivers better response quality compared to the Chat Completion RAG model.

However, several blocking issues persist:

🔴 File Size and Token Limitations: We are forced to build custom logic to validate and truncate content at the line level before file ingestion. Failing to do so causes indexing failures and breaks search accuracy.

🔴 Vector Store Capacity Limit: The current 10,000 file cap per vector store is unscalable for our needs. Since Assistants can only link to one vector store, creating and managing multiple stores is inefficient and impractical for projects with large datasets and frequent updates.

AI Agent (Preview) Thereafter, support team recommended to use Agent feature due to its ability to link indexes. However it has:

🔴 Search Scope Issue: The Agent appears to return information beyond the linked index content — for example, answering with generalized or fabricated data when it should be restricted to the linked documents.

➤ Example: When asked, “How many questions do we have in Japan that has parking space?” — the Agent responded with "thousands," even though the index contains only one document with this detail. (But we solved this with giving instructions to Agent)

LLM Context Limitations Still Persist Even after linking an index (Agent), the LLM still appears to use only a very small portion of retrieved documents/content to formulate a response. This prevents meaningful summarization or insight generation across even modest datasets (e.g., 20–50 feedback records).

What We Need Support On:

Agent Configuration: Is there a way to strictly confine the Agent’s answers to the linked index content (like a closed-book RAG model)? Are there flags/settings to disable its default "open-box" behavior?
Assistant Scaling: Are there any upcoming changes or workarounds to:
- Raise or bypass the 10k file limit on vector stores?
- Link multiple vector stores to a single Assistant?
- Improve file ingestion flexibility without extensive manual pre-processing?
Cross-Chunk & Multi-Record Context Handling: Is there a recommended architecture for enabling LLMs to reason across multiple retrieved records (e.g., summarizing 10k documents at once)? Current Assistant/Agent implementations still exhibit GPT's default behavior of responding to only the top 4–5 chunks.

Regards

Vishav Deep Singh

Saideep Anchuri 9,425 Reputation points Microsoft External Staff Moderator

2025-05-28T06:29:57.66+00:00

Hi Vishav Singh | CHECKER

It seems like you are facing some challenges with Azure OpenAI Assistant and AI Agent in your feedback analysis solution.

Here are some steps:

Preprocessing-

To address the file size and token limitations causing indexing failures, you can consider implementing custom logic to validate and truncate content at the line level before file ingestion.

Using Vector store files-

If the current 10,000 file cap per vector store is unscalable for your needs, you may need to explore alternative solutions such as creating and managing multiple vector stores efficiently. However, it's important to consider the impact on project efficiency and practicality for projects with large datasets and frequent updates.

Accuracy and Token size:

To address the search scope issue where the Agent returns information beyond the linked index content, you can review the configuration settings of the Agent. Ensure that the Agent is strictly confined to the linked index content by adjusting temperature, max_token and explicitly mentioning agent to keep answer precise

Even after linking an index with the Agent, if the LLM is using only a small portion of retrieved documents/content to formulate responses, you may need to explore ways to enable the LLM to utilize a broader context like Optimizing Top_K, Top_P , number of results etc.

kindly refer below link: announcing-general-availability-of-azure-ai-foundry-agent-

quotas-limits

Thank You.
Saideep Anchuri 9,425 Reputation points Microsoft External Staff Moderator

2025-05-29T03:40:36.8866667+00:00

Hi Vishav Singh | CHECKER

Did you get any chance the response.

Thank You.
Saideep Anchuri 9,425 Reputation points Microsoft External Staff Moderator

2025-05-30T01:05:40.97+00:00

Hi Vishav Singh | CHECKER

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.

Thank You.
Jalpa Chudasama | CHECKER 0 Reputation points

2025-05-30T13:48:09.8166667+00:00
Hii @Saideep Anchuri , I tried using Azure Agent and linked it to an index containing 20,000 documents. In the agent’s description, I specified: "Make sure you provide the response from the linked index source only. Do not provide responses from outside sources."

However, the agent still occasionally returns answers that appear to come from outside the linked index.

Also, I noticed the top_k parameter has a maximum limit of 50. When I try to set top_k to 55, I receive the following error: "Integer above maximum value. Expected a value <= 50, but got 55 instead."

My questions:

How can I ensure the Azure Agent always responds strictly using only the linked index data?

How can I retrieve and use more than 50 documents from the index (i.e., a top_k value greater than 50)?

1 answer

Your answer

Saideep Anchuri 9,425 Reputation points Microsoft External Staff Moderator

2025-05-28T06:29:57.66+00:00

Hi Vishav Singh | CHECKER

It seems like you are facing some challenges with Azure OpenAI Assistant and AI Agent in your feedback analysis solution.

Here are some steps:

Preprocessing-

To address the file size and token limitations causing indexing failures, you can consider implementing custom logic to validate and truncate content at the line level before file ingestion.

Using Vector store files-

If the current 10,000 file cap per vector store is unscalable for your needs, you may need to explore alternative solutions such as creating and managing multiple vector stores efficiently. However, it's important to consider the impact on project efficiency and practicality for projects with large datasets and frequent updates.

Accuracy and Token size:

To address the search scope issue where the Agent returns information beyond the linked index content, you can review the configuration settings of the Agent. Ensure that the Agent is strictly confined to the linked index content by adjusting temperature, max_token and explicitly mentioning agent to keep answer precise

Even after linking an index with the Agent, if the LLM is using only a small portion of retrieved documents/content to formulate responses, you may need to explore ways to enable the LLM to utilize a broader context like Optimizing Top_K, Top_P , number of results etc.

kindly refer below link: announcing-general-availability-of-azure-ai-foundry-agent-

quotas-limits

Thank You.
Saideep Anchuri 9,425 Reputation points Microsoft External Staff Moderator

2025-05-29T03:40:36.8866667+00:00

Hi Vishav Singh | CHECKER

Did you get any chance the response.

Thank You.
Saideep Anchuri 9,425 Reputation points Microsoft External Staff Moderator

2025-05-30T01:05:40.97+00:00

Hi Vishav Singh | CHECKER

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.

Thank You.
Jalpa Chudasama | CHECKER 0 Reputation points

2025-05-30T13:48:09.8166667+00:00

Hii @Saideep Anchuri , I tried using Azure Agent and linked it to an index containing 20,000 documents. In the agent’s description, I specified: "Make sure you provide the response from the linked index source only. Do not provide responses from outside sources."

However, the agent still occasionally returns answers that appear to come from outside the linked index.

Also, I noticed the top_k parameter has a maximum limit of 50. When I try to set top_k to 55, I receive the following error: "Integer above maximum value. Expected a value <= 50, but got 55 instead."

My questions:

How can I ensure the Azure Agent always responds strictly using only the linked index data?

How can I retrieve and use more than 50 documents from the index (i.e., a top_k value greater than 50)?

Answer 1

Alex Burlachenko 9,780

hi vishav!

thanks for throwing this question out here, super detailed and super helpful for others wrestling with the same stuff ))

alright, let’s break it down. u’re hitting some real pain points with azure openai assistants and agents, especially around scaling and accuracy.

file size & token limits

yeah, the file size thing is a headache. right now, assistants choke if docs are too big or messy. u’re already doing the smart thing with custom truncation, pre-split ur docs into smaller chunks before ingestion. use something like the text splitter in langchain (or a simple python script) to break ‘em down by paragraphs or sections. that way, u avoid the line-level chaos.

vector store capacity

10k files per store is tight, no lie. for now, u gotta juggle multiple stores if u’re over that limit. but! u can automate the linking part with the api spin up new stores dynamically and attach ‘em as needed. it’s clunky, but it works. microsoft’s working on scaling this (fingers crossed), but no ETA yet.

agent going rogue

ugh, the agent pulling answers outta thin air is frustrating. to lock it down, u gotta hammer the instructions. like, really specific. try something like: “only use info from the linked index. if it’s not there, say ‘i don’t know’.” also, check the strict_mode flag in the agent config it’s in preview, but might help.

llm context limits

this one’s a beast. even with RAG, the llm’s attention span is… short. to squeeze more in, try tweaking the chunk_size and overlap in ur index settings. smaller chunks + overlap can help it “see” more connections. for summarization across tons of docs, u might need a hybrid approach: first, use cognitive search to pull the top relevant chunks, then feed those into the llm with a prompt like “summarize these records, not ur general knowledge.”

upcoming fixes?

microsoft’s been pretty hush-hush, but the agent/assistant stuff is evolving fast. check the azure updates blog they drop surprises there.

hang in there! u’re already ahead of the curve by mixing assistants and agents. if u nail the pre-processing and instructions, u can brute-force ur way to something workable for now....

rgds,

Alex

https://ctrlaltdel.blog/

JAYA SHANKAR G S 3,960 Reputation points Microsoft External Staff Moderator

2025-05-30T09:22:08.58+00:00

Hello Vishav Singh | CHECKER,

Please check above response and let us know if you have any query.

These are limitation as of now, you can give feedback here feedback.azure.com based on demand they might consider changing it.

Thank you
JAYA SHANKAR G S 3,960 Reputation points Microsoft External Staff Moderator

2025-06-02T06:07:38.81+00:00

Hello Vishav Singh | CHECKER, Jalpa Chudasama | CHECKER,

Please check the above given community points, what he is mentioned is correct.

you can give system message something like below for getting response relevant to documents.

You are an AI assistant, extract information from the linked index and perform given task. if it’s not there, say "i don’t know"

Thank you
Vishav Singh | CHECKER 20 Reputation points

2025-06-02T06:27:54.9433333+00:00

@Saideep Anchuri

Can we have a call to discuss and close this matter asap?

Regards

Vishav Deep Singh
Jalpa Chudasama | CHECKER 0 Reputation points

2025-06-02T12:57:27.9233333+00:00

@JAYA SHANKAR G S ,I used above instructions and first time I got response with limited documents. Then when I unlinked index and again linked index it got out of index response. Can you suggest me what is the issue? Can we scheduled call on tomorrow?

Me and @Vishav Singh | CHECKER added feedback.

Link: https://feedback.azure.com/d365community/idea/f0af2bd3-7c3f-f011-a2da-6045bdb23f19

Thanks,
Jalpa Chudasama
JAYA SHANKAR G S 3,960 Reputation points Microsoft External Staff Moderator

2025-06-04T04:22:41.2833333+00:00

Hello @Jalpa Chudasama | CHECKER Vishav Singh | CHECKER,

It's good to know that you submitted feedback, and regarding your earlier comment for top_k parameter we can't do much about that, it is known limit and documented.

I have gone through the many sources to improve the quality of answers, i found combining agentic retrieval in Azure AI Search and Azure AI Agent Service.

Here is the github repository and blog post

Here, creating a function tool which returns retrieved results from ai search using agentic search. please check this and let me know if you are facing any errors.

Thank you
Jalpa Chudasama | CHECKER 0 Reputation points

2025-06-05T06:21:59.7466667+00:00

Hello @JAYA SHANKAR G S ,

Thanks for your response, I checked your suggested agentic search, we can enforce the quality of our response and not increase the document limit.

I have a question about agent, we can only set top_k upto 50, this is the limit. It's ok. So can we use skip and top parameters inside agent? As we have used in Azure AI search.

If I asked my question to agent it collects response from top 50 documents so can we skip this 50 and get 50 again? Is it possible?

Thanks,

Jalpa Chudasama
JAYA SHANKAR G S 3,960 Reputation points Microsoft External Staff Moderator

2025-06-05T11:13:28.6266667+00:00

Hello @Jalpa Chudasama | CHECKER Vishav Singh | CHECKER,

In the above given notebook while making agentic retrieval search, we can pass parameter using KnowledgeAgentIndexParams class, there we have option to pass max_docs_for_reranker.

Please try the parameter giving maximum documents. If that helps update here, on skipping part i am checking on that.

Thank you
JAYA SHANKAR G S 3,960 Reputation points Microsoft External Staff Moderator

2025-06-06T12:16:39.6166667+00:00

Hello Vishav Singh | CHECKER,

Please update here once you tried above parameter and if you found efficient approach for this scenario share it here it will help the community.

Thank you

Jalpa Chudasama | CHECKER 0

Hello @JAYA SHANKAR G S ,

Issue While Implementing Knowledge Agents – Retrieve API Errors


We attempted to implement Knowledge Agents using the official documentation provided here: 🔗 https://learn.microsoft.com/en-us/azure/search/search-agentic-retrieval-how-to-create
We are currently facing issues with the Retrieve API. Below is the step-by-step implementation and the problems encountered.  Step 1: Create Knowledge Agent


$searchUrl = 'test.search.windows.net';
$agentName = 'agentName';
$indexName = 'indexName';
$modelProviderUrl = 'https://test.openai.azure.com';
$searchApiKey = '*********';
$modelApiKey = '************';
$endpoint = "https://{$searchUrl}/agents/{$agentName}?api-version=2025-05-01-preview";

$data = [
    'name' => $agentName,
    'targetIndexes' => [[
        'indexName' => $indexName,
        'defaultRerankerThreshold' => 2.5,
        'defaultIncludeReferenceSourceData' => true,
        'defaultMaxDocsForReranker' => 200
    ]],
    'models' => [[
        'kind' => 'azureOpenAI',
        'azureOpenAIParameters' => [
        'resourceUri' => $modelProviderUrl,
        'apiKey' => $modelApiKey,
        'deploymentId' => 'text-embedding-3-large',
        'modelName' => 'gpt-4o'
        ]
    ]],
    'requestLimits' => [
        'maxOutputSize' => 5000,
        'maxRuntimeInSeconds' => 60
    ],
    'encryptionKey' => null
];
$ch = curl_init($endpoint);
$headers = [
'Content-Type: application/json',
'api-key: ' . $searchApiKey
];
curl_setopt_array($ch, [
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_CUSTOMREQUEST => 'PUT',
    CURLOPT_POSTFIELDS => json_encode($data),
    CURLOPT_HTTPHEADER => $headers
]);
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if (curl_errno($ch)) {
    echo 'cURL Error: ' . curl_error($ch) . "\n";
} else {
    echo "HTTP Status Code: " . $httpCode . "\n";
    echo "Response: " . $response . "\n";
}
curl_close($ch);


Step 2: Retrieve API Implementation

$searchUrl = 'test.search.windows.net';
$agentName = 'agentName';
$indexName = 'indexName';
$searchApiKey = '***********************';
$endpoint = "https://{$searchUrl}/agents/{$agentName}/retrieve?api-version=2025-05-01-preview";
$data = [
    'messages' => [
        [
        'role' => 'assistant',
            'content' => [[
            'type' => 'text',
            'text' => "You are a helpful assistant for Contoso Human Resources. You have access to a search index containing guidelines about health care coverage for Washington state. If you can't find the answer in the search, say you don't know."
            ]]
        ],
        [
        'role' => 'user',
            'content' => [[
            'type' => 'text',
            'text' => 'What are my vision benefits?'
            ]]
        ]
    ],
    'targetIndexParams' => [[
        'indexName' => $indexName,
        'filterAddOn' => "State eq 'WA'",
        'IncludeReferenceSourceData' => true,
        'rerankerThreshold' => 2.5,
        'maxDocsForReranker' => 250
    ]]
];
$ch = curl_init($endpoint);
$headers = [
    'Content-Type: application/json',
    // Note: We tried both of the following in different tests
    // 'api-key: ' . $searchApiKey
    'Authorization: Bearer ' . $searchApiKey
];
curl_setopt_array($ch, [
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_CUSTOMREQUEST => 'POST',
    CURLOPT_POSTFIELDS => json_encode($data),
    CURLOPT_HTTPHEADER => $headers
]);
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if (curl_errno($ch)) {
    echo 'cURL Error: ' . curl_error($ch) . "\n";
} else {
    echo "HTTP Status Code: " . $httpCode . "\n";
    echo "<pre>";
    print_r(json_decode($response));
    echo "</pre>";
}
curl_close($ch);


Step 3: Generate Access Token

$tenant = '*************';
$client = '*************';
$secret = '*************';
$url = "https://login.microsoftonline.com/$tenant/oauth2/v2.0/token";
$data = http_build_query([
    'grant_type' => 'client_credentials',
    'client_id' => $client,
    'client_secret' => $secret,
    'scope' => 'https://search.azure.com/.default'
]);
$ch = curl_init($url);
curl_setopt_array($ch, [
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_POST => true,
    CURLOPT_HTTPHEADER => ['Content-Type: application/x-www-form-urlencoded'],
    CURLOPT_POSTFIELDS => $data
]);
$resp = curl_exec($ch);
curl_close($ch);
echo "<pre>";
print_r(json_decode($resp)->access_token);
echo "</pre>";

Issues Encountered with Retrieve API
1. Using Access Token:
    * Status Code: 403 Forbidden
    * Issue: No response returned.
2. Using Search API Key as Bearer Token:
    * Status Code: 401 Unauthorized
    * Issue: No response returned.
3. Using Search API Key with api-key Header:
    * Status Code: 404 Not Found
    * Response: Could not complete model action. The model endpoint returned status code '404' (NotFound). Resource not found

JAYA SHANKAR G S 3,960 Reputation points Microsoft External Staff Moderator

2025-06-12T04:51:28.19+00:00

Hello @Jalpa Chudasama | CHECKER ,

Can you please raise new question with the above code and error details.

Thank you
Jalpa Chudasama | CHECKER 0 Reputation points

2025-06-12T05:12:40.3766667+00:00

Hello @JAYA SHANKAR G S ,

I have raised a new question, the link to the new question is here: https://learn.microsoft.com/en-us/answers/questions/2283142/issue-while-implementing-knowledge-agents-retrieve

Thanks,

Jalpa Chudasama

Share via

Persistent Limitations in Azure OpenAI for Multi-Document RAG Analysis — Follow-Up on Assistant & Agent Recommendations

1 answer

Your answer