How add source document as filter to vector index

Sunil Nagireddy 105 Reputation points
2025-06-09T14:59:56.8+00:00

We have implemented RAG frame work in Azure AI foundry. We have built vector index on top of 250 source pdf files. We are using prompt flow in azure ai foundry to build some custome logic for which we want to search vector index along with filter on source document name to meet our requirement.

i.e we want to make sure search will be performed on only the source file which was passed as filter along with user query.

We could see index schema under search service/index, but it is not allowing to modify the source file as filter, however it is allowing to add new field as filter. As schema is getting created automatically as part of index creation in azure ai foundry/Data+Indexes, we are not able to select source file name as filter in the process of creating vector index.

Can you please provide detail steps how we can create vector index with filter created on source file name in azure ai foundry

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,339 questions
{count} votes

Accepted answer
  1. Suresh Chikkam 2,135 Reputation points Microsoft External Staff Moderator
    2025-06-10T18:01:15.97+00:00

    Sunil Nagireddy, when you create a vector index in Azure AI Foundry, the schema is generated automatically and metadata fields like metadata_storage_name are not marked filterable and Foundry’s UI doesn’t let you change that. To restrict searches to a particular PDF, you need to edit the underlying Cognitive Search index schema so that there is a filterable file-name field, re-index your data, and then supply an OData filter in your vector query.

    First, open your Azure Cognitive Search resource in the portal, go to Indexes, find the index that Foundry created for your PDFs, and export its JSON definition. In that JSON, add a new field called sourceFileName (or edit the existing metadata_storage_name field) so that it reads.

    {
      "name": "sourceFileName",
      "type": "Edm.String",
      "searchable": false,
      "filterable": true,
      "retrievable": true,
      "sortable": false,
      "facetable": false
    }
    

    If you prefer to reuse the built-in metadata field, simply set its "filterable": true. Then recreate the index either by deleting the old one or giving the new one the same name using the portal or the Azure CLI.

    az search index show \
      --name your-index-name \
      --service-name your-search-service \
      --resource-group your-rg > index.json
    # edit index.json as above
    az search index create \
      --name your-index-name \
      --service-name your-search-service \
      --resource-group your-rg \
      --body @index.json
    

    Once the new schema is in place, rerun your indexer or re-execute the Foundry ingestion job so that every document chunk carries the sourceFileName value. You can confirm in Search Explorer that each document now shows the correct file name.

    Finally, in your Prompt Flow’s vector-search (Index Lookup) step, paste an OData filter like:

    sourceFileName eq 'MyDocument.pdf'
    

    (or metadata_storage_name eq 'MyDocument.pdf' if you updated that field). This makes sure the search only matches embeddings from that specific PDF.

    Hope it helps!


    Please do not forget to click "Accept the answer” and Yes wherever the information provided helps you, this can be beneficial to other community members.

    User's image

    If you have any other questions or still running into more issues, let me know in the "comments" and I would be happy to help you.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.