Azure AI Search for PDF indexer

Reddy 20 Reputation points
2024-02-28T17:27:38.9533333+00:00

I am trying to create a PDF indexer using Azure AI Search service and I want to index the pdf documents which are uploaded from my web application (using .Net Core) and these documents are stored in blob storage. My final goal is to search in the Indexed pdf documents and return those matched pdf documents only based on the search text. Is there a solution to return the matched PDF's instead of plain text. Thanks

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
724 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,440 questions
{count} votes

Accepted answer
  1. Grmacjon-MSFT 16,191 Reputation points
    2024-03-01T07:41:00.7066667+00:00

    Hello @Reddy , what is the size of the your PDF docs? While Azure AI Search directly returns extracted text from indexed documents, there isn't a built-in functionality to return the entire PDF document. However, you can achieve your goal of searching and retrieving the original PDF documents based on search text using a combination of Azure AI Search and Azure Blob Storage.

    Here's one approach:

    1. Indexing PDFs and Extracting Text:

    • Use Azure Blob Storage: Store your uploaded PDFs in Azure Blob Storage.
    • Create an Azure AI Search index: Define your search index with appropriate fields, including one for the document's blob storage URL.
    • Use Azure Blob Indexer: This built-in indexer automatically extracts text content from uploaded PDFs and populates the search index. It will also include the blob storage URL in the indexed data.

    2. Searching and Retrieving PDFs:

    • Perform search through Azure AI Search: Use your search queries to search the indexed text content of your PDFs.
    • Retrieve matching PDFs: In the search results, you'll receive the blob storage URLs for the matching PDFs.
    • Access and download PDFs: Use the retrieved blob storage URLs from your .NET Core application to access and download the original PDF documents using the Azure Blob Storage SDK for .NET.

    Hope that helps.

    -Grace

    3 people found this answer helpful.

0 additional answers

Sort by: Most helpful