In Azure AI Search, when dealing with multi-page PDF documents stored in SharePoint, the search functionality indexes the entire document as a single entity. This means if a match is found in any searchable field or subfield within the document, the entire document is returned as a result. However, it does not support performing searches specifically on subfields of complex types directly.
To address this, you can parse the search response in your application to display only the necessary subfields to the user. Additionally, you can consider restructuring your indexed documents. One approach is to index each page of the PDF file as a separate document within AI Search. This approach simplifies the search process as each page becomes an individual document. For example, you can structure your documents like this:
{
"document": [
{ "pagenumber": 1, "content": "..." },
{ "pagenumber": 2, "content": "hello" }
]
}
Be aware that Azure Search has a limitation that the complex objects in collections across a single document cannot exceed maximum of 3000 elements.
Resources:
Please do not forget to "up-vote" wherever the information provided helps you, as this can be beneficial to other community members.