Query a knowledge base using the retrieve action or MCP endpoint

Note

This agentic retrieval feature is generally available in the 2026-04-01 REST API via programmatic access. The Azure portal and Microsoft Foundry portal continue to provide preview-only access to all agentic retrieval features. For migration guidance, see Migrate agentic retrieval code to the latest version.

If you choose to use a preview REST API, you can access capabilities that aren't yet generally available for this feature. Preview features are provided without a service-level agreement and aren't recommended for production workloads. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Important

These features and functionality are part of the 2026-05-01-preview REST API. The 2026-05-01-preview is licensed to you as part of your Azure subscription and is subject to the terms applicable to "Previews" in the Microsoft Product Terms, the Microsoft Products and Services Data Protection Addendum ("DPA"), and the Supplemental Terms of Use for Microsoft Azure Previews.

The 2026-05-01-preview supports connections to other Microsoft services and third-party services. Use of these services is subject to their respective terms and might result in data processing or storage outside of the Azure compliance boundary, as well as data flowing into the Azure compliance boundary.

It's your responsibility to manage whether your data will flow outside of your organization's compliance and geographic boundaries and any related implications, and that appropriate permissions, boundaries, and approvals are provisioned.

You're responsible for carefully reviewing and testing applications you build in the context of your specific use cases and making all appropriate decisions and customizations. This includes implementing your own responsible AI mitigations, such as metaprompts, content filters, or other safety systems, and ensuring your applications meet appropriate quality, reliability, security, and trustworthiness standards. For more information, see the Azure AI Search Transparency Note.

In an agentic retrieval pipeline, the retrieve action invokes parallel query processing from a knowledge base. You can call the retrieve action directly using the Search Service REST APIs or an Azure SDK. Each knowledge base also exposes a Model Context Protocol (MCP) endpoint for consumption by MCP-compatible agents.

This article explains how to call both retrieval methods with optional permissions enforcement and interpret the three-pronged response. To set up a pipeline that connects Azure AI Search to Foundry Agent Service via MCP, see Tutorial: Build an end-to-end agentic retrieval solution.

Prerequisites

An Azure AI Search service with a knowledge base.
Permissions to query knowledge bases. Configure keyless authentication with the Search Index Data Reader role assigned to your user account (recommended) or use an API key.
If the knowledge base specifies an LLM, the search service must have a managed identity with Cognitive Services User permissions on the Microsoft Foundry resource.

Required Azure.Search.Documents package:
- For 2026-05-01-preview features, the latest preview package: dotnet add package Azure.Search.Documents --prerelease
- For 2026-04-01 features, the latest stable package: dotnet add package Azure.Search.Documents

Required azure-search-documents package:
- For 2026-05-01-preview features, the latest preview package: pip install --pre azure-search-documents
- For 2026-04-01 features, the latest stable package: pip install azure-search-documents

Required REST API version:
- For preview features: Search Service 2026-05-01-preview
- For generally available features: Search Service 2026-04-01

Call the retrieve action

You specify the retrieve action on a knowledge base. The request body includes the query input and an optional list of knowledge sources to target.

2026-05-01-preview
2026-04-01

using Azure;
using Azure.Search.Documents.KnowledgeBases;
using Azure.Search.Documents.KnowledgeBases.Models;

// Create knowledge base retrieval client
var kbClient = new KnowledgeBaseRetrievalClient(
    endpoint: new Uri("<YOUR SEARCH SERVICE URL>"),
    knowledgeBaseName: "<YOUR KNOWLEDGE BASE NAME>",
    tokenCredential: new DefaultAzureCredential()
);

var retrievalRequest = new KnowledgeBaseRetrievalRequest();
retrievalRequest.Messages.Add(
    new KnowledgeBaseMessage(
        content: new[] {
            new KnowledgeBaseMessageTextContent(
                "You can answer questions about the Earth at night. "
                + "Sources have a JSON format with a ref_id that must be cited in the answer. "
                + "If you do not have the answer, respond with 'I do not know'."
            )
        }
    ) { Role = "assistant" }
);
retrievalRequest.Messages.Add(
    new KnowledgeBaseMessage(
        content: new[] {
            new KnowledgeBaseMessageTextContent(
                "Why is the Phoenix nighttime street grid so sharply visible from space, "
                + "whereas large stretches of the interstate between midwestern cities remain comparatively dim?"
            )
        }
    ) { Role = "user" }
);

var result = await kbClient.RetrieveAsync(retrievalRequest);
Console.WriteLine(
    (result.Value.Response[0].Content[0] as KnowledgeBaseMessageTextContent)!.Text
);

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

using Azure;
using Azure.Search.Documents.KnowledgeBases;
using Azure.Search.Documents.KnowledgeBases.Models;

// Create knowledge base retrieval client
var kbClient = new KnowledgeBaseRetrievalClient(
    endpoint: new Uri("<YOUR SEARCH SERVICE URL>"),
    knowledgeBaseName: "<YOUR KNOWLEDGE BASE NAME>",
    tokenCredential: new DefaultAzureCredential()
);

var retrievalRequest = new KnowledgeBaseRetrievalRequest();
retrievalRequest.Intents.Add(
    new KnowledgeRetrievalSemanticIntent(
        "Why is the Phoenix nighttime street grid so sharply visible from space, "
        + "whereas large stretches of the interstate between midwestern cities remain comparatively dim?"
    )
);

var result = await kbClient.RetrieveAsync(retrievalRequest);
Console.WriteLine(
    (result.Value.Response[0].Content[0] as KnowledgeBaseMessageTextContent)!.Text
);

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

2026-05-01-preview
2026-04-01

from azure.identity import DefaultAzureCredential
from azure.search.documents.knowledgebases import KnowledgeBaseRetrievalClient
from azure.search.documents.knowledgebases.models import (
    KnowledgeBaseMessage,
    KnowledgeBaseMessageTextContent,
    KnowledgeBaseRetrievalRequest,
    SearchIndexKnowledgeSourceParams,
)

# Create knowledge base retrieval client
kb_client = KnowledgeBaseRetrievalClient(
    endpoint="<YOUR SEARCH SERVICE URL>",
    knowledge_base_name="<YOUR KNOWLEDGE BASE NAME>",
    credential=DefaultAzureCredential(),
)

request = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(
            role="assistant",
            content=[
                KnowledgeBaseMessageTextContent(
                    text="You can answer questions about the Earth at night. "
                    "Sources have a JSON format with a ref_id that must be cited in the answer. "
                    "If you do not have the answer, respond with 'I do not know'."
                )
            ],
        ),
        KnowledgeBaseMessage(
            role="user",
            content=[
                KnowledgeBaseMessageTextContent(
                    text="Why is the Phoenix nighttime street grid so sharply visible from space, "
                    "whereas large stretches of the interstate between midwestern cities remain comparatively dim?"
                )
            ],
        ),
    ],
    knowledge_source_params=[
        SearchIndexKnowledgeSourceParams(
            knowledge_source_name="earth-at-night-blob-ks",
        )
    ],
)

result = kb_client.retrieve(request)
print(result.response[0].content[0].text)

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

from azure.identity import DefaultAzureCredential
from azure.search.documents.knowledgebases import KnowledgeBaseRetrievalClient
from azure.search.documents.knowledgebases.models import (
    KnowledgeRetrievalSemanticIntent,
    KnowledgeBaseRetrievalRequest,
    SearchIndexKnowledgeSourceParams,
)

# Create knowledge base retrieval client
kb_client = KnowledgeBaseRetrievalClient(
    endpoint="<YOUR SEARCH SERVICE URL>",
    knowledge_base_name="<YOUR KNOWLEDGE BASE NAME>",
    credential=DefaultAzureCredential(),
)

request = KnowledgeBaseRetrievalRequest(
    intents=[
        KnowledgeRetrievalSemanticIntent(
            search="Why is the Phoenix nighttime street grid so sharply visible from space, "
            "whereas large stretches of the interstate between midwestern cities remain comparatively dim?"
        )
    ],
    knowledge_source_params=[
        SearchIndexKnowledgeSourceParams(
            knowledge_source_name="earth-at-night-blob-ks",
        )
    ],
)

result = kb_client.retrieve(request)
print(result.response[0].content[0].text)

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

2026-05-01-preview
2026-04-01

@search-url = <YOUR SEARCH SERVICE URL> // Example: https://my-service.search.windows.net
@accessToken = <YOUR ACCESS TOKEN> // Run: az account get-access-token --scope https://search.azure.com/.default --query accessToken -o tsv

POST {{search-url}}/knowledgebases/{{knowledge-base-name}}/retrieve?api-version=2026-05-01-preview
Content-Type: application/json
Authorization: Bearer {{accessToken}}

{
    "messages": [
        {
            "role": "assistant",
            "content": [
                {
                    "type": "text",
                    "text": "You can answer questions about the Earth at night. Sources have a JSON format with a ref_id that must be cited in the answer. If you do not have the answer, respond with 'I do not know'."
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Why is the Phoenix nighttime street grid so sharply visible from space, whereas large stretches of the interstate between midwestern cities remain comparatively dim?"
                }
            ]
        }
    ],
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": "earth-at-night-blob-ks",
            "kind": "searchIndex"
        }
    ]
}

Reference: Knowledge Retrieval - Retrieve

@search-url = <YOUR SEARCH SERVICE URL> // Example: https://my-service.search.windows.net
@accessToken = <YOUR ACCESS TOKEN> // Run: az account get-access-token --scope https://search.azure.com/.default --query accessToken -o tsv

POST {{search-url}}/knowledgebases/{{knowledge-base-name}}/retrieve?api-version=2026-04-01
Content-Type: application/json
Authorization: Bearer {{accessToken}}

{
    "intents": [
        {
            "type": "semantic",
            "search": "Why is the Phoenix nighttime street grid so sharply visible from space, whereas large stretches of the interstate between midwestern cities remain comparatively dim?"
        }
    ],
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": "earth-at-night-blob-ks",
            "kind": "searchIndex"
        }
    ]
}

Reference: Knowledge Retrieval - Retrieve

Important

The 2026-04-01 API version only supports the intents input and minimal, extractive retrieval. Preview-only capabilities, including the messages input, query planning, answer synthesis, and configurable reasoning effort, aren't supported. For full functionality, use the 2026-05-01-preview.

Request parameters

Pass the following parameters to call the retrieve action.

2026-05-01-preview
2026-04-01

Name	Description	Type	Editable	Required
`messages`	Contains the chat conversation history sent to the agentic retrieval pipeline. The LLM determines the query from the conversation history. The message format is similar to Azure OpenAI APIs. Supported only if the retrieval reasoning effort is low or medium.	Object	Yes	No
`messages.role`	Defines where the message came from, such as `assistant` or `user`. The model you use determines which roles are valid.	String	Yes	No
`messages.content`	The message or prompt sent to the LLM. Must be text.	Array	Yes	No
`includeActivity`	When `true`, the response includes an `activity` array that describes the steps the pipeline ran, such as query planning, search index calls, and answer synthesis. Defaults to `false`. For a usage example, see Inspect model names in activity logs.	Boolean	Yes	No
`maxOutputDocuments`	Caps the number of grounding documents returned by the retrieve call. Applies after per-source candidate selection. If `maxOutputSize` is also set, both constraints apply, and whichever limit is reached first wins. The service can return fewer documents than this parameter's value if fewer results survive ranking, thresholding, or deduplication. For a usage example and a table of setting combinations, see Limit final grounding documents.	Integer	Yes	No
`maxOutputSize`	Limits the size, in tokens, of the grounded response payload. Documents that don't fit under the limit are omitted from the response. If `maxOutputDocuments` is also set, both constraints apply, and whichever limit is reached first wins. For a usage example and a table of setting combinations, see Limit final grounding documents.	Integer	Yes	No
`retrievalReasoningEffort`	Sets the retrieval reasoning effort for the request and overrides the knowledge base default. For valid values and tradeoffs, see Set the retrieval reasoning effort.	Object	Yes	No
`knowledgeSourceParams`	Overrides default retrieval settings per knowledge source. Useful for customizing the query or response at query time.	Object	Yes	No
`knowledgeSourceParams.knowledgeSourceName`	Name of the knowledge source the entry applies to. The knowledge source must already be attached to the knowledge base.	String	Yes	Yes
`knowledgeSourceParams.kind`	Discriminator for the knowledge source type, such as `searchIndex`, `web`, `azureBlob`, or `sharepoint`. Must match the underlying knowledge source kind.	String	Yes	Yes
`knowledgeSourceParams.alwaysQuerySource`	When `true`, the pipeline always queries this knowledge source instead of relying on the planner to decide. Useful when a source must always participate in the response. This parameter is independent of `failOnError`. To require a source to always run and fail the request if it errors, set both to `true`.	Boolean	Yes	No
`knowledgeSourceParams.failOnError`	When `true`, the retrieve request fails with `502 Bad Gateway` and an error message that identifies the knowledge source that couldn't be queried, instead of returning a partial response from the remaining sources. Defaults to `false`, which means the pipeline favors availability and returns results from other sources when one fails. Independent of `alwaysQuerySource`, which controls whether the source is attempted at all; `failOnError` controls what happens when that attempt fails. For a usage example, see Require a knowledge source to succeed.	Boolean	Yes	No
`knowledgeSourceParams.maxOutputDocuments`	Caps the number of candidate documents this knowledge source contributes before the final result selection. Use `50` for cross-region compatibility because some preview regions cap this per-source parameter at 50. Doesn't control the final number of grounding documents returned to the caller. The service can return fewer documents when fewer matches are available or when internal limits apply. For a usage example, see Tune candidate documents per knowledge source.	Integer	Yes	No
`knowledgeSourceParams.includeReferences`	When `true`, the response includes a `references` array that identifies the documents that contributed to the answer for this source. For a usage example, see Set references for each knowledge source.	Boolean	Yes	No
`knowledgeSourceParams.includeReferenceSourceData`	When `true`, references include the source data fields configured on the knowledge source. For a usage example, see Set references for each knowledge source.	Boolean	Yes	No
`knowledgeSourceParams.rerankerThreshold`	Minimum reranker score that a candidate document must have to be included in the result set for this source.	Number	Yes	No
`knowledgeSourceParams.filterAddOn`	OData filter appended to the persisted `baseFilter` (if any) for search index knowledge sources, narrowing the source query at request time. For filter syntax and examples, see Filter search index knowledge sources at query time.	String	Yes	No

Name	Description	Type	Editable	Required
`intents`	A list of search intents sent to the agentic retrieval pipeline. Each intent specifies a query type and a search string.	Array	Yes	Yes
`intents.type`	The query type. The only valid value is `semantic`.	String	Yes	Yes
`intents.search`	The search string for the query.	String	Yes	Yes
`knowledgeSourceParams`	Overrides default retrieval settings per knowledge source. Useful for customizing the query or response at query time.	Object	Yes	No

Include images in retrieve responses (preview)

For blob, indexed OneLake, and indexed SharePoint knowledge sources configured with an asset store, you can return document-embedded images alongside text and inject them into the answer synthesis prompt. Set enableImageServing on the matching entry in knowledgeSourceParams to override the default that's set on the knowledge base definition.

Image serving runs only when outputMode is answerSynthesis and requires the 2026-05-01-preview REST API or an equivalent Azure SDK preview package. For setup steps, the precedence table, and how to inspect image serving statistics, see Surface document-embedded images in agentic retrieval (preview).

Search index behavior

For knowledge sources that target a search index, all searchable fields are in scope for query execution. The implied query type is semantic, and there's no search mode.

If the index includes vector fields, you need a valid vectorizer definition so the agentic retrieval engine can vectorize query inputs. Otherwise, vector fields are ignored.

For more information, see Create an index for agentic retrieval.

Call the MCP endpoint

Important

MCP implementations are susceptible to risks, such as attacks, cascading failures, and loss of human oversight. You can mitigate these risks by vetting MCP servers for security and reliability, following Microsoft's recommended practices and industry best practices, and implementing approval mechanisms and monitoring cascading behaviors.

MCP is an open protocol that standardizes how AI applications connect to external data sources and tools.

In Azure AI Search, each knowledge base is a standalone MCP server that exposes the knowledge_base_retrieve tool. Any MCP-compatible client, including Foundry Agent Service, GitHub Copilot, Claude, and Cursor, can invoke this tool to query the knowledge base.

MCP endpoint format

Each knowledge base has an MCP endpoint at the following URL.

https://<your-service-name>.search.windows.net/knowledgebases/<your-knowledge-base-name>/mcp?api-version=<api-version>

The API version you specify determines what the connection returns. With 2026-05-01-preview, the knowledge base can return synthesized answers when the underlying knowledge base is configured with an LLM and a compatible reasoning effort. With 2026-04-01, retrieval is always minimal and extractive, and the connection returns grounding data only.

Authenticate to the MCP endpoint

The MCP endpoint requires authentication via custom headers. You have two options:

(Recommended) Pass a bearer token in the Authorization header. The identity behind the token must have the Search Index Data Reader role assigned on the search service. This approach avoids storing keys in configuration files. For more information, see Connect your app to Azure AI Search using identities.
Pass an admin key in the api-key header. An admin key provides full read-write access to the search service, so use it with caution. For more information, see Connect to Azure AI Search using API keys.

Tip

Each MCP client configures custom headers differently. For example:

In Foundry Agent Service, you configure authentication via a project connection and add the MCP tool to an agent. The service automatically injects the required headers on MCP requests.
In GitHub Copilot and similar clients, you configure headers in the MCP server JSON, such as mcp.json.

Filter search index knowledge sources at query time

When retrieving from a search index knowledge source, you can apply an OData filter at query time to narrow the results to specific documents or fields. The filter expression uses OData syntax and is passed via the filterAddOn parameter.

Filter syntax and examples

The filterAddOn parameter accepts OData filter expressions. Example patterns include:

Metadata fields: city eq 'Phoenix', status eq 'active'
Date ranges: publishDate ge 2024-01-01 and publishDate le 2024-12-31
Numeric ranges: price ge 100 and price le 5000
Text matching: substringof('climate', description), indexof(title, 'urgent') ge 0
Logical operators: (category eq 'News' or category eq 'Analysis') and status eq 'published'

using Azure;
using Azure.Search.Documents.KnowledgeBases;
using Azure.Search.Documents.KnowledgeBases.Models;

var kbClient = new KnowledgeBaseRetrievalClient(
    endpoint: new Uri("<YOUR SEARCH SERVICE URL>"),
    knowledgeBaseName: "<YOUR KNOWLEDGE BASE NAME>",
    tokenCredential: new DefaultAzureCredential()
);

var retrievalRequest = new KnowledgeBaseRetrievalRequest();

retrievalRequest.Messages.Add(
    new KnowledgeBaseMessage(
        content: new[] {
            new KnowledgeBaseMessageTextContent(
                "You are a support agent. Answer questions based on published documentation. "
                + "If you don't know the answer, say so."
            )
        }
    ) { Role = "assistant" }
);

retrievalRequest.Messages.Add(
    new KnowledgeBaseMessage(
        content: new[] {
            new KnowledgeBaseMessageTextContent(
                "What is the process for submitting an expense report?"
            )
        }
    ) { Role = "user" }
);

// Apply a filter to search only published documents
var searchIndexParams = new SearchIndexKnowledgeSourceParams(
    knowledgeSourceName: "internal-documentation-ks"
);
searchIndexParams.FilterAddOn = "status eq 'published'";

retrievalRequest.KnowledgeSourceParams.Add(searchIndexParams);

var result = await kbClient.RetrieveAsync(retrievalRequest);
Console.WriteLine(
    (result.Value.Response[0].Content[0] as KnowledgeBaseMessageTextContent)!.Text
);

from azure.identity import DefaultAzureCredential
from azure.search.documents.knowledgebases import KnowledgeBaseRetrievalClient
from azure.search.documents.knowledgebases.models import (
    KnowledgeBaseMessage,
    KnowledgeBaseMessageTextContent,
    KnowledgeBaseRetrievalRequest,
    SearchIndexKnowledgeSourceParams,
)

kb_client = KnowledgeBaseRetrievalClient(
    endpoint="<YOUR SEARCH SERVICE URL>",
    knowledge_base_name="<YOUR KNOWLEDGE BASE NAME>",
    credential=DefaultAzureCredential(),
)

request = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(
            role="assistant",
            content=[
                KnowledgeBaseMessageTextContent(
                    text="You are a support agent. Answer questions based on published documentation. "
                    "If you don't know the answer, say so."
                )
            ],
        ),
        KnowledgeBaseMessage(
            role="user",
            content=[
                KnowledgeBaseMessageTextContent(
                    text="What is the process for submitting an expense report?"
                )
            ],
        ),
    ],
    knowledge_source_params=[
        SearchIndexKnowledgeSourceParams(
            knowledge_source_name="internal-documentation-ks",
            # Apply a filter to search only published documents
            filter_add_on="status eq 'published'",
        )
    ],
)

result = kb_client.retrieve(request)
print(result.response[0].content[0].text)

POST https://<YOUR SEARCH SERVICE>.search.windows.net/knowledgebases/<YOUR KNOWLEDGE BASE NAME>/retrieve?api-version=2026-05-01-preview
Content-Type: application/json
Authorization: Bearer <YOUR ACCESS TOKEN>

{
    "messages": [
        {
            "role": "assistant",
            "content": [
                {
                    "type": "text",
                    "text": "You are a support agent. Answer questions based on published documentation. If you don't know the answer, say so."
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What is the process for submitting an expense report?"
                }
            ]
        }
    ],
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": "internal-documentation-ks",
            "kind": "searchIndex",
            "filterAddOn": "status eq 'published'"
        }
    ]
}

Multi-filter example

You can combine multiple filters to further refine results.

searchIndexParams.FilterAddOn = "(status eq 'published' or status eq 'internal') and created ge 2025-01-01";

filter_add_on="(status eq 'published' or status eq 'internal') and created ge 2025-01-01"

{
    "knowledgeSourceName": "internal-documentation-ks",
    "kind": "searchIndex",
    "filterAddOn": "(status eq 'published' or status eq 'internal') and created ge 2025-01-01"
}

Enforce permissions at query time (preview)

Important

The 2026-05-01-preview can't modify access permissions that were set outside of the 2026-05-01-preview. If you use the 2026-05-01-preview with access- or permission-restricted content, a timing lag will occur before the 2026-05-01-preview recognizes changes to those access or permission restrictions.

If your knowledge sources contain permission-protected content, the retrieval engine can filter results so that each user only sees the documents they're authorized to access. You enable this filtering by passing the end user's identity on the retrieve request. Without the identity token, results from permission-enabled knowledge sources are returned unfiltered.

Permissions enforcement has two parts:

Ingestion time: For indexed knowledge sources only, set ingestionPermissionOptions to ingest permission metadata alongside content.
Query time: Pass the user's access token in the x-ms-query-source-authorization header.

Ingestion-time configuration

The following table shows which knowledge sources require ingestion-time configuration and how each source enforces permissions.

Knowledge source	Requires `ingestionPermissionOptions`	How permissions are enforced
Blob or ADLS Gen2	✅	Ingested RBAC scopes, ACLs, or Microsoft Purview matched against user identity.
OneLake	✅	Ingested document Microsoft Purview sensitivity labels matched against user identity.
Indexed SharePoint	✅	Ingested SharePoint ACLs or Microsoft Purview sensitivity labels matched against user identity.
Remote SharePoint	❌	Copilot Retrieval API queries SharePoint directly using the user's token.
Fabric Data Agent	❌	The retrieval engine exchanges the user's token for a Microsoft Fabric–scoped token and queries the data agent on their behalf.
Fabric Ontology	❌	The retrieval engine exchanges the user's token for a Microsoft Fabric–scoped token and queries the ontology item on their behalf.
Work IQ	❌	The retrieval engine exchanges the user's token for a Work IQ–scoped token and queries Work IQ on their behalf.

Important

If ingestionPermissionOptions wasn't configured when the indexed knowledge source was created, no permission metadata exists in the index. Results are returned unfiltered, regardless of the header. To fix this, recreate the knowledge source with the appropriate ingestionPermissionOptions values.

Query-time authorization

To pass the end user's identity, include an access token scoped to https://search.azure.com/.default on the retrieve request. This token is separate from the service credential used to access the search service. It doesn't need search service permissions and only represents the user whose content access is evaluated. For more information, see Query-time ACL and RBAC enforcement.

In the .NET SDK, pass the token as the xMsQuerySourceAuthorization parameter on RetrieveAsync:

using Azure;
using Azure.Search.Documents.KnowledgeBases;
using Azure.Search.Documents.KnowledgeBases.Models;

// Service credential: Authenticates to the search service
var serviceCredential = new DefaultAzureCredential();

// User identity token: Represents the end user for document-level permissions filtering
var userTokenContext = new Azure.Core.TokenRequestContext(
    new[] { "https://search.azure.com/.default" }
);
string userToken = (await serviceCredential.GetTokenAsync(userTokenContext)).Token;

// Create the retrieval client with the service credential
var kbClient = new KnowledgeBaseRetrievalClient(
    endpoint: new Uri("<YOUR SEARCH SERVICE URL>"),
    knowledgeBaseName: "<YOUR KNOWLEDGE BASE NAME>",
    tokenCredential: serviceCredential
);

var request = new KnowledgeBaseRetrievalRequest();
request.Messages.Add(
    new KnowledgeBaseMessage(
        content: new[] {
            new KnowledgeBaseMessageTextContent(
                "What companies are in the financial sector?")
        }
    ) { Role = "user" }
);

// Pass the user identity token for permissions filtering
var result = await kbClient.RetrieveAsync(
    request, xMsQuerySourceAuthorization: userToken);

var text = (result.Value.Response[0].Content[0] as KnowledgeBaseMessageTextContent)!.Text;
Console.WriteLine(text);

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

In the Python SDK, pass the token as the x_ms_query_source_authorization parameter on retrieve:

from azure.identity import DefaultAzureCredential
from azure.core.credentials import get_bearer_token_provider
from azure.search.documents.knowledgebases import KnowledgeBaseRetrievalClient
from azure.search.documents.knowledgebases.models import (
    KnowledgeBaseMessage, KnowledgeBaseMessageTextContent,
    KnowledgeBaseRetrievalRequest,
)

# Service credential: Authenticates to the search service
service_credential = DefaultAzureCredential()

# User identity token: Represents the end user for document-level permissions filtering
user_token_provider = get_bearer_token_provider(
    service_credential, "https://search.azure.com/.default")
user_token = user_token_provider()

# Create the retrieval client with the service credential
kb_client = KnowledgeBaseRetrievalClient(
    endpoint="<YOUR SEARCH SERVICE URL>",
    knowledge_base_name="<YOUR KNOWLEDGE BASE NAME>",
    credential=service_credential,
)

request = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(
            role="user",
            content=[KnowledgeBaseMessageTextContent(
                text="What companies are in the financial sector?")],
        )
    ]
)

# Pass the user identity token for permissions filtering
result = kb_client.retrieve(
    retrieval_request=request, x_ms_query_source_authorization=user_token)
print(result.response[0].content[0].text)

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

In the REST API, include the x-ms-query-source-authorization header with the user's access token:

@search-url = <YOUR SEARCH SERVICE URL>
@accessToken = <YOUR ACCESS TOKEN> // Service credential
@userAccessToken = <USER ACCESS TOKEN> // User identity token

POST {{search-url}}/knowledgebases/{{knowledge-base-name}}/retrieve?api-version=2026-05-01-preview
Authorization: Bearer {{accessToken}}
Content-Type: application/json
x-ms-query-source-authorization: {{userAccessToken}}

{
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What companies are in the financial sector?"
                }
            ]
        }
    ]
}

Reference: Knowledge Retrieval - Retrieve

Review the response

Successful retrieval returns a 200 OK status code. If the knowledge base fails to retrieve from one or more knowledge sources, the service returns a 206 Partial Content status code. The response only includes results from sources that succeeded. The activity array contains details about the partial response as errors.

The retrieve action returns three main components:

2026-05-01-preview
2026-04-01

Extracted response or synthesized answer (depending on output mode)
Activity array
References array

Extracted response

The extracted response is a single, unified string that you typically pass to an LLM. The LLM consumes the string as grounding data and uses it to formulate a response. Your API call to the LLM includes the unified string and instructions for the model, such as whether to use the grounding exclusively or as a supplement.

The body of the response is structured in the chat message style format, and the content is serialized JSON.

"response": [
    {
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "[{\"ref_id\":\"0\",\"title\":\"Urban Structure\",\"terms\":\"Location of Phoenix, Grid of City Blocks, Phoenix Metropolitan Area at Night\",\"content\":\"<content chunk redacted>\"}]"
            }
        ]
    }
]

Key points:

content.type has one valid value: text.
content.text is a JSON-encoded string containing the most relevant documents (or chunks) found in the search index, given the query and chat history inputs. This string is your grounding data that an LLM uses to formulate a response to the user's question.
- This portion of the response consists of 200 chunks or fewer, excluding any results that fail to meet the minimum threshold of a 2.5 reranker score.
- The string starts with the reference ID of the chunk (used for citation purposes), and any fields specified in the semantic configuration of the target index. In this example, assume the semantic configuration in the target index has a "title" field, a "terms" field, and a "content" field.
The maxOutputSizeInTokens property (maxOutputSize in 2026-05-01-preview) on the retrieve request determines the length of the string.

Important

A document that exceeds the maxOutputSizeInTokens output budget can be omitted from the response. The activity array includes a warning when the most relevant document exceeds the maximum output size. To retain more content, increase maxOutputSizeInTokens. For more information, see Troubleshoot empty responses.

Activity array

The activity array outputs the query plan, which provides operational transparency for tracking operations, billing implications, and resource invocations. It also includes subqueries sent to the retrieval pipeline and errors for any retrieval failures, such as inaccessible knowledge sources.

The output includes the following components.

2026-05-01-preview
2026-04-01

Section	Description
`modelQueryPlanning`	For knowledge bases that use an LLM for query planning, this section reports on the token counts used for input, and the token count for the subqueries. Includes a `modelName` field with the public model name (not the deployment name) of the model that ran the activity.
Source-specific activity	For each knowledge source included in the query, this section reports on elapsed time and which arguments were used in the query, including semantic ranker. Knowledge source types include `searchIndex`, `azureBlob`, and other supported knowledge sources.
`agenticReasoning`	This section reports on token consumption for agentic reasoning during retrieval, which depends on the specified retrieval reasoning effort.
`modelAnswerSynthesis`	For knowledge bases that use answer synthesis, this section reports on the token count for formulating the answer, and the token count of the answer output. Includes a `modelName` field with the public model name (not the deployment name) of the model that ran the activity.
`modelWebSummarization`	For knowledge bases that use web summarization, this section reports on token consumption for summarizing web results. Includes a `modelName` field with the public model name (not the deployment name) of the model that ran the activity.
`imageServing`	For knowledge sources that have image serving enabled, this section reports `imagesRetrieved`, `imagesSentToModel`, `totalImageSizeBytes`, and whether indexing-time `verbalizationUsed` was on. To find the number of dropped images, subtract `imagesSentToModel` from `imagesRetrieved`.

Section	Description
Source-specific activity	For each knowledge source included in the query, this section reports on elapsed time and which arguments were used in the query, including semantic ranker. Knowledge source types include `searchIndex`, `azureBlob`, and other supported knowledge sources.
`agenticReasoning`	This section reports on token consumption for agentic reasoning during retrieval.

Here's an example of the activity array:

2026-05-01-preview
2026-04-01

  "activity": [
    {
      "type": "modelQueryPlanning",
      "id": 0,
      "inputTokens": 2302,
      "outputTokens": 109,
      "elapsedMs": 2396
    },
    {
      "type": "searchIndex",
      "id": 1,
      "knowledgeSourceName": "demo-financials-ks",
      "queryTime": "2025-11-04T19:25:23.683Z",
      "count": 26,
      "elapsedMs": 1137,
      "searchIndexArguments": {
        "search": "List of companies in the financial sector according to SEC GICS classification",
        "filter": null,
        "sourceDataFields": [ ],
        "searchFields": [ ],
        "semanticConfigurationName": "en-semantic-config"
      }
    },
    {
      "type": "searchIndex",
      "id": 2,
      "knowledgeSourceName": "demo-healthcare-ks",
      "queryTime": "2025-11-04T19:25:24.186Z",
      "count": 17,
      "elapsedMs": 494,
      "searchIndexArguments": {
        "search": "List of companies in the financial sector according to SEC GICS classification",
        "filter": null,
        "sourceDataFields": [ ],
        "searchFields": [ ],
        "semanticConfigurationName": "en-semantic-config"
      }
    },
    {
      "type": "agenticReasoning",
      "id": 3,
      "retrievalReasoningEffort": {
        "kind": "low"
      },
      "reasoningTokens": 103368
    },
    {
      "type": "modelAnswerSynthesis",
      "id": 4,
      "inputTokens": 5821,
      "outputTokens": 344,
      "elapsedMs": 3837
    }
  ]

  "activity": [
    {
      "type": "searchIndex",
      "id": 0,
      "knowledgeSourceName": "demo-financials-ks",
      "queryTime": "2025-11-04T19:25:23.683Z",
      "count": 26,
      "elapsedMs": 1137,
      "searchIndexArguments": {
        "search": "List of companies in the financial sector according to SEC GICS classification",
        "filter": null,
        "sourceDataFields": [ ],
        "searchFields": [ ],
        "semanticConfigurationName": "en-semantic-config"
      }
    },
    {
      "type": "searchIndex",
      "id": 1,
      "knowledgeSourceName": "demo-healthcare-ks",
      "queryTime": "2025-11-04T19:25:24.186Z",
      "count": 17,
      "elapsedMs": 494,
      "searchIndexArguments": {
        "search": "List of companies in the financial sector according to SEC GICS classification",
        "filter": null,
        "sourceDataFields": [ ],
        "searchFields": [ ],
        "semanticConfigurationName": "en-semantic-config"
      }
    },
    {
      "type": "agenticReasoning",
      "id": 2,
      "reasoningTokens": 103368
    }
  ]

References array

The references array comes directly from the underlying grounding data. It includes the sourceData used to generate the response and consists of every document the agentic retrieval engine finds and semantically ranks. Fields in the sourceData include an id and semantic fields: title, terms, and content.

The id acts as a reference ID for an item within a specific response. It's not the document key in the search index. You use it for providing citations. The activitySource field cross-references the id of the activity entry that produced the reference, which is useful for citation linking.

Here's an example of the references array:

  "references": [
    {
      "type": "searchIndex",
      "id": "0",
      "activitySource": 2,
      "docKey": "earth_at_night_508_page_104_verbalized",
      "sourceData": null
    },
    {
      "type": "searchIndex",
      "id": "1",
      "activitySource": 2,
      "docKey": "earth_at_night_508_page_105_verbalized",
      "sourceData": null
    }
  ]

Inspect sensitivity label metadata in the response (preview)

Important

When you query a knowledge base that ingests Microsoft Purview sensitivity labels, the retrieve response includes label metadata at two levels:

Location	Field	Description
Per reference	`sensitivityLabelInfo`	The sensitivity label applied to each document returned in the `references` array.
Response	`metadata.responseSensitivityLabelInfo`	An aggregate label that represents the highest-priority sensitivity label across all referenced documents in the response. Useful for client-side display banners and policy enforcement.

Microsoft Graph computes the response-level label from the per-reference labels using the Microsoft Purview label inheritance rules. Typically, the most restrictive label wins.

The following example shows a retrieve response with two referenced documents (one Confidential, one Internal) and the resulting response-level label.

{
  "response": [
    {
      "role": "assistant",
      "content": [
        { "type": "text", "text": "[ ... grounding data ... ]" }
      ]
    }
  ],
  "references": [
    {
      "type": "azureBlob",
      "id": "0",
      "activitySource": 1,
      "docKey": "contract-2026.pdf",
      "sensitivityLabelInfo": {
        "labelId": "<label-guid>",
        "labelName": "Confidential",
        "color": "#FF0000",
        "tooltip": "Confidential — Recipients can read but not forward.",
        "isEncrypted": true,
        "priority": 3
      },
      "sourceData": null
    },
    {
      "type": "azureBlob",
      "id": "1",
      "activitySource": 1,
      "docKey": "policy-overview.pdf",
      "sensitivityLabelInfo": {
        "labelId": "<label-guid>",
        "labelName": "Internal",
        "color": "#FFA500",
        "tooltip": "For internal use only.",
        "isEncrypted": false,
        "priority": 1
      },
      "sourceData": null
    }
  ],
  "metadata": {
    "responseSensitivityLabelInfo": {
      "labelId": "<label-guid>",
      "labelName": "Confidential",
      "color": "#FF0000",
      "tooltip": "Confidential — Recipients can read but not forward.",
      "isEncrypted": true,
      "priority": 3
    }
  }
}

Reference types that surface sensitivity labels

The field name and availability of label metadata depend on the knowledge source type that produced each reference.

Reference `type`	Label field	Available when...
`azureBlob`	`sensitivityLabelInfo`	The blob knowledge source includes `sensitivityLabel` in `ingestionPermissionOptions`.
`indexedOneLake`	`sensitivityLabelInfo`	The OneLake knowledge source includes `sensitivityLabel` in `ingestionPermissionOptions`.
`indexedSharePoint`	`sensitivityLabelInfo`	The SharePoint-indexed knowledge source includes `sensitivityLabel` in `ingestionPermissionOptions`.
`searchIndex`	`sensitivityLabelInfo`	The underlying index has `purviewEnabled` set to `true` and a field marked with `sensitivityLabel: true`.

Display and audit recommendations

Use sensitivityLabelInfo.labelId to look up the full label definition through the Microsoft Graph sensitivity label APIs when you need additional properties, such as policy controls or permissions.
Use metadata.responseSensitivityLabelInfo to render a response-level sensitivity banner or apply policy controls, such as disabling copy and share, across the answer.
If your knowledge source points to a chunked index, such as one populated through integrated vectorization or a custom Text Split skill, make sure the skillset projects the sensitivity label to each chunk row. Without this mapping, chunk-level references aren't filtered correctly at query time.
For auditable administrative access to labeled content, see Elevated read for administrative investigations.

MCP server behavior

The MCP endpoint exposed by each knowledge base surfaces the same sensitivity label fields as the REST API. When an MCP-compatible client invokes the knowledge_base_retrieve tool, the tool result contains the same per-reference sensitivityLabelInfo and response-level metadata.responseSensitivityLabelInfo documented earlier in this section. MCP clients enforce label-aware display and policy controls based on these fields.

Retrieve action examples (preview)

The following examples illustrate different ways to call the retrieve action using the 2026-05-01-preview API version, which supports the full feature set, including answer synthesis and a configurable reasoning effort. For 2026-04-01 usage, see the previous sections.

Inspect model names in activity logs
Require a knowledge source to succeed
Tune candidate documents per knowledge source
Limit final grounding documents
Override default reasoning effort and set request limits
Set references for each knowledge source
Use minimal reasoning effort

Inspect model names in activity logs

Model-backed activity records include a modelName field when includeActivity is enabled. Use this field to confirm which configured model handled query planning, answer synthesis, or web summarization during a retrieve request.

var retrievalRequest = new KnowledgeBaseRetrievalRequest();
retrievalRequest.Messages.Add(
    new KnowledgeBaseMessage(
        content: new[] {
            new KnowledgeBaseMessageTextContent("Which policy applies to returns?")
        }
    ) { Role = "user" }
);
retrievalRequest.IncludeActivity = true;

var result = await kbClient.RetrieveAsync(retrievalRequest);
foreach (var entry in result.Value.Activity)
{
    Console.WriteLine($"{entry.Type} modelName={entry.ModelName}");
}

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

request = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(
            role="user",
            content=[KnowledgeBaseMessageTextContent(text="Which policy applies to returns?")],
        )
    ],
    include_activity=True,
)

result = kb_client.retrieve(request)
for entry in result.activity:
    print(entry.type, "modelName=", getattr(entry, "model_name", None))

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

POST {{search-url}}/knowledgebases/{{knowledge-base-name}}/retrieve?api-version=2026-05-01-preview
Authorization: Bearer {{accessToken}}
Content-Type: application/json

{
    "messages": [
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "Which policy applies to returns?" }
            ]
        }
    ],
    "includeActivity": true
}

Reference: Knowledge Retrieval - Retrieve

The following response excerpt shows activity records with modelName.

{
  "activity": [
    {
      "type": "modelQueryPlanning",
      "id": 0,
      "modelName": "gpt-5-mini",
      "inputTokens": 1842,
      "outputTokens": 87,
      "elapsedMs": 1923
    },
    {
      "type": "searchIndex",
      "id": 1,
      "knowledgeSourceName": "operations-ks",
      "count": 12,
      "elapsedMs": 234
    },
    {
      "type": "modelAnswerSynthesis",
      "id": 2,
      "modelName": "gpt-5-mini",
      "inputTokens": 2418,
      "outputTokens": 179,
      "elapsedMs": 931
    }
  ]
}

Require a knowledge source to succeed

Set failOnError in knowledgeSourceParams to mark a knowledge source as required. Use this parameter when a partial answer would be misleading or noncompliant if that source is unavailable.

var retrievalRequest = new KnowledgeBaseRetrievalRequest();
retrievalRequest.Messages.Add(
    new KnowledgeBaseMessage(
        content: new[] {
            new KnowledgeBaseMessageTextContent("Which HR policy applies?")
        }
    ) { Role = "user" }
);
retrievalRequest.KnowledgeSourceParams.Add(
    new SearchIndexKnowledgeSourceParams("hr-policy-ks")
    {
        FailOnError = true,
        AlwaysQuerySource = true
    }
);
retrievalRequest.KnowledgeSourceParams.Add(
    new SearchIndexKnowledgeSourceParams("hr-faq-ks")
);

var result = await kbClient.RetrieveAsync(retrievalRequest);

Reference: SearchIndexKnowledgeSourceParams

request = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(
            role="user",
            content=[KnowledgeBaseMessageTextContent(text="Which HR policy applies?")],
        )
    ],
    knowledge_source_params=[
        SearchIndexKnowledgeSourceParams(
            knowledge_source_name="hr-policy-ks",
            fail_on_error=True,
            always_query_source=True,
        ),
        SearchIndexKnowledgeSourceParams(
            knowledge_source_name="hr-faq-ks",
        ),
    ],
)

result = kb_client.retrieve(request)

Reference: SearchIndexKnowledgeSourceParams

POST {{search-url}}/knowledgebases/{{knowledge-base-name}}/retrieve?api-version=2026-05-01-preview
Authorization: Bearer {{accessToken}}
Content-Type: application/json

{
    "messages": [
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "Which HR policy applies?" }
            ]
        }
    ],
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": "hr-policy-ks",
            "kind": "searchIndex",
            "failOnError": true,
            "alwaysQuerySource": true
        },
        {
            "knowledgeSourceName": "hr-faq-ks",
            "kind": "searchIndex"
        }
    ]
}

Reference: Knowledge Retrieval - Retrieve

Tune candidate documents per knowledge source

Set maxOutputDocuments in knowledgeSourceParams to cap how many candidate documents a specific knowledge source contributes before final result selection. Use this parameter when you want to bound one source's input to the pipeline without affecting others.

var retrievalRequest = new KnowledgeBaseRetrievalRequest();
retrievalRequest.Messages.Add(
    new KnowledgeBaseMessage(
        content: new[] {
            new KnowledgeBaseMessageTextContent("What safety procedures apply?")
        }
    ) { Role = "user" }
);
retrievalRequest.KnowledgeSourceParams.Add(
    new SearchIndexKnowledgeSourceParams("operations-ks")
    {
        MaxOutputDocuments = 50
    }
);

var result = await kbClient.RetrieveAsync(retrievalRequest);

Reference: SearchIndexKnowledgeSourceParams

request = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(
            role="user",
            content=[KnowledgeBaseMessageTextContent(text="What safety procedures apply?")],
        )
    ],
    knowledge_source_params=[
        SearchIndexKnowledgeSourceParams(
            knowledge_source_name="operations-ks",
            max_output_documents=50,
        ),
    ],
)

result = kb_client.retrieve(request)

Reference: SearchIndexKnowledgeSourceParams

POST {{search-url}}/knowledgebases/operations-kb/retrieve?api-version=2026-05-01-preview
Authorization: Bearer {{accessToken}}
Content-Type: application/json

{
    "messages": [
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "What safety procedures apply?" }
            ]
        }
    ],
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": "operations-ks",
            "kind": "searchIndex",
            "maxOutputDocuments": 50
        }
    ]
}

Reference: Knowledge Retrieval - Retrieve

Limit final grounding documents

The top-level maxOutputDocuments parameter caps how many grounding documents are returned in the final retrieve response. Use this parameter when your application needs a predictable citation or reference count.

var retrievalRequest = new KnowledgeBaseRetrievalRequest();
retrievalRequest.Messages.Add(
    new KnowledgeBaseMessage(
        content: new[] {
            new KnowledgeBaseMessageTextContent("What is the return policy?")
        }
    ) { Role = "user" }
);
retrievalRequest.OutputMode = "extractedData";
retrievalRequest.MaxOutputDocuments = 3;
retrievalRequest.MaxOutputSize = 6000;

var result = await kbClient.RetrieveAsync(retrievalRequest);

Reference: KnowledgeBaseRetrievalRequest

request = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(
            role="user",
            content=[KnowledgeBaseMessageTextContent(text="What is the return policy?")],
        )
    ],
    output_mode="extractedData",
    max_output_documents=3,
    max_output_size=6000,
)

result = kb_client.retrieve(request)

Reference: KnowledgeBaseRetrievalRequest

POST {{search-url}}/knowledgebases/{{knowledge-base-name}}/retrieve?api-version=2026-05-01-preview
Authorization: Bearer {{accessToken}}
Content-Type: application/json

{
    "messages": [
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "What is the return policy?" }
            ]
        }
    ],
    "outputMode": "extractedData",
    "maxOutputDocuments": 3,
    "maxOutputSize": 6000
}

Reference: Knowledge Retrieval - Retrieve

The following table shows how maxOutputDocuments and maxOutputSize interact across all four combinations.

`maxOutputDocuments`	`maxOutputSize`	Behavior
Unspecified	Unspecified	Uses the default `maxOutputSize` response limit behavior.
Unspecified	Specified	Discards documents once the payload-size limit is reached.
Specified	Unspecified	Returns up to the specified number of grounding documents and doesn't apply a `maxOutputSize` limit.
Specified	Specified	Returns up to `maxOutputDocuments` documents or however many documents fit under `maxOutputSize`, whichever limit applies first.

Override default reasoning effort and set request limits

This example specifies answer synthesis, so the retrieval reasoning effort must be low or medium. It also sets maxRuntimeInSeconds to cap total request latency and maxOutputSize to bound the response payload.

var retrievalRequest = new KnowledgeBaseRetrievalRequest();
retrievalRequest.Messages.Add(
    new KnowledgeBaseMessage(
        content: new[] {
            new KnowledgeBaseMessageTextContent("What companies are in the financial sector?")
        }
    ) { Role = "user" }
);
retrievalRequest.RetrievalReasoningEffort = new KnowledgeRetrievalLowReasoningEffort();
retrievalRequest.OutputMode = "answerSynthesis";
retrievalRequest.MaxRuntimeInSeconds = 30;
retrievalRequest.MaxOutputSize = 6000;

var result = await kbClient.RetrieveAsync(retrievalRequest);
Console.WriteLine(
    (result.Value.Response[0].Content[0] as KnowledgeBaseMessageTextContent)!.Text
);

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

from azure.search.documents.knowledgebases.models import KnowledgeRetrievalLowReasoningEffort

request = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(
            role="user",
            content=[KnowledgeBaseMessageTextContent(text="What companies are in the financial sector?")],
        )
    ],
    retrieval_reasoning_effort=KnowledgeRetrievalLowReasoningEffort(),
    output_mode="answerSynthesis",
    max_runtime_in_seconds=30,
    max_output_size=6000,
)

result = kb_client.retrieve(request)
print(result.response[0].content[0].text)

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

POST {{search-url}}/knowledgebases/kb-override/retrieve?api-version=2026-05-01-preview
Authorization: Bearer {{accessToken}}
Content-Type: application/json

{
    "messages": [
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "What companies are in the financial sector?" }
            ]
        }
    ],
    "retrievalReasoningEffort": { "kind": "low" },
    "outputMode": "answerSynthesis",
    "maxRuntimeInSeconds": 30,
    "maxOutputSize": 6000
}

Reference: Knowledge Retrieval - Retrieve

Set references for each knowledge source

Use includeReferences and includeReferenceSourceData in knowledgeSourceParams to control which sources appear in the references array and how much source data each entry includes. This example uses the knowledge base's default reasoning effort.

var retrievalRequest = new KnowledgeBaseRetrievalRequest();
retrievalRequest.Messages.Add(
    new KnowledgeBaseMessage(
        content: new[] {
            new KnowledgeBaseMessageTextContent("What companies are in the financial sector?")
        }
    ) { Role = "user" }
);
retrievalRequest.IncludeActivity = true;
retrievalRequest.KnowledgeSourceParams.Add(
    new SearchIndexKnowledgeSourceParams("demo-financials-ks")
    {
        IncludeReferences = true,
        IncludeReferenceSourceData = true
    }
);

retrievalRequest.KnowledgeSourceParams.Add(
    new SearchIndexKnowledgeSourceParams("demo-communicationservices-ks")
    {
        IncludeReferences = false,
        IncludeReferenceSourceData = false
    }
);

retrievalRequest.KnowledgeSourceParams.Add(
    new SearchIndexKnowledgeSourceParams("demo-healthcare-ks")
    {
        IncludeReferences = true,
        IncludeReferenceSourceData = false,
        AlwaysQuerySource = true
    }
);

var result = await kbClient.RetrieveAsync(retrievalRequest);
Console.WriteLine(
    (result.Value.Response[0].Content[0] as KnowledgeBaseMessageTextContent)!.Text
);

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

from azure.search.documents.knowledgebases.models import SearchIndexKnowledgeSourceParams

request = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(
            role="user",
            content=[KnowledgeBaseMessageTextContent(text="What companies are in the financial sector?")],
        )
    ],
    include_activity=True,
    knowledge_source_params=[
        SearchIndexKnowledgeSourceParams(
            knowledge_source_name="demo-financials-ks",
            include_references=True,
            include_reference_source_data=True,
        ),
        SearchIndexKnowledgeSourceParams(
            knowledge_source_name="demo-communicationservices-ks",
            include_references=False,
            include_reference_source_data=False,
        ),
        SearchIndexKnowledgeSourceParams(
            knowledge_source_name="demo-healthcare-ks",
            include_references=True,
            include_reference_source_data=False,
            always_query_source=True,
        ),
    ],
)

result = kb_client.retrieve(request)
print(result.response[0].content[0].text)

Reference: KnowledgeBaseRetrievalClient, SearchIndexKnowledgeSourceParams

POST {{search-url}}/knowledgebases/kb-medium-example/retrieve?api-version=2026-05-01-preview
Authorization: Bearer {{accessToken}}
Content-Type: application/json

{
    "messages": [
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "What companies are in the financial sector?" }
            ]
        }
    ],
    "includeActivity": true,
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": "demo-financials-ks",
            "kind": "searchIndex",
            "includeReferences": true,
            "includeReferenceSourceData": true
        },
        {
            "knowledgeSourceName": "demo-communicationservices-ks",
            "kind": "searchIndex",
            "includeReferences": false,
            "includeReferenceSourceData": false
        },
        {
            "knowledgeSourceName": "demo-healthcare-ks",
            "kind": "searchIndex",
            "includeReferences": true,
            "includeReferenceSourceData": false,
            "alwaysQuerySource": true
        }
    ]
}

Reference: Knowledge Retrieval - Retrieve

Use minimal reasoning effort

In this example, there's no LLM for intelligent query planning or answer synthesis. The query string goes to the agentic retrieval engine for keyword search or hybrid search.

var retrievalRequest = new KnowledgeBaseRetrievalRequest();
retrievalRequest.Intents.Add(
    new KnowledgeRetrievalSemanticIntent("what is a brokerage")
);

var result = await kbClient.RetrieveAsync(retrievalRequest);
Console.WriteLine(
    (result.Value.Response[0].Content[0] as KnowledgeBaseMessageTextContent)!.Text
);

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

from azure.search.documents.knowledgebases.models import (
    KnowledgeBaseRetrievalRequest,
    KnowledgeRetrievalSemanticIntent,
)

request = KnowledgeBaseRetrievalRequest(
    intents=[
        KnowledgeRetrievalSemanticIntent(
            search="what is a brokerage",
        )
    ]
)

result = kb_client.retrieve(request)
print(result.response[0].content[0].text)

Reference: KnowledgeBaseRetrievalClient, KnowledgeBaseRetrievalRequest

POST {{search-url}}/knowledgebases/kb-minimal/retrieve?api-version=2026-05-01-preview
Authorization: Bearer {{accessToken}}
Content-Type: application/json

{
    "intents": [
        {
            "type": "semantic",
            "search": "what is a brokerage"
        }
    ]
}

Reference: Knowledge Retrieval - Retrieve

Troubleshoot empty responses

A document can be found during the search step but still be omitted from the final response if its grounded content exceeds the maxOutputSizeInTokens (maxOutputSize in 2026-05-01-preview) output budget. When this happens, the activity array shows that matches were found, and the activity record includes a warning that the most relevant document exceeded the maximum output size. The references array and grounded response content are empty for that document. To retain more content, increase maxOutputSizeInTokens.

To avoid this behavior, index large source documents as smaller chunks with stable identifiers and source metadata. This applies especially to long manuals, policies, or knowledge base articles.

Feedback

Was this page helpful?

Last updated on 2026-06-02

Query a knowledge base using the retrieve action or MCP endpoint

Prerequisites

Call the retrieve action

Request parameters

Include images in retrieve responses (preview)

Search index behavior

Call the MCP endpoint

MCP endpoint format

Authenticate to the MCP endpoint

Filter search index knowledge sources at query time

Filter syntax and examples

Multi-filter example

Enforce permissions at query time (preview)

Ingestion-time configuration

Query-time authorization

Review the response

Extracted response

Activity array

References array

Inspect sensitivity label metadata in the response (preview)

Reference types that surface sensitivity labels

Display and audit recommendations

MCP server behavior

Retrieve action examples (preview)

Inspect model names in activity logs

Require a knowledge source to succeed

Tune candidate documents per knowledge source

Limit final grounding documents

Override default reasoning effort and set request limits

Set references for each knowledge source

Use minimal reasoning effort

Troubleshoot empty responses

Related content

Feedback

Additional resources