How to get from which page number of uploaded document , the Azure cognitive search result is coming.

Nita Anil 21 Reputation points
2021-04-30T15:56:53.657+00:00

I have used AI enrichment in Azure Cognitive Search for getting search results from uploaded documents from the blob container. When I search the text, I would like to get the search results along with the page number.

For the uploaded document I am getting all the match results highlighted but, I was not able to get the page number where the result is taken from. Uploaded documents are of pdf and image formats. Could you please provide a solution on how to get the page number along with the search result? I have used the search explorer and Postman to check the search result.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
942 questions
{count} votes

Accepted answer
  1. ajkuma 25,701 Reputation points Microsoft Employee
    2021-05-04T06:51:57.313+00:00

    @Nita Anil , Following-up on this from my comments:

    Azure Cognitive Search can’t return page numbers to you by default but there are a few ways you could go about getting this information.

    1. Return hit highlights of the search results and compare the text in the hit highlights with the document to identify which page(s) in the document the match is on. This is probably the simplest approach.
    2. For PDFs, you have the option to set the imageAction to "generateNormalizedImagePerPage" in the indexer. This will create one image per page which can then be sent to the OCR skill. You could map the output of the OCR skill to a Collection(Edm.String) field where each item is the text from a single page. You could then use the location of the match in the collection to ascertain the page number. Note that this will only work for PDFs.

    Hope this helps!


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.