Azure AI search returning full PDF instead of relevant answer

Sushant Shelake 0 Reputation points
2024-03-25T06:00:51.7333333+00:00

I have connected my Azure AI search with a Blob storage that contains a PDF document that needs to be crawled. However, when I ask a question in the search, it returns the entire PDF instead of relevant answers. I need help figuring out how to properly crawl the PDF document.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
708 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Grmacjon-MSFT 16,011 Reputation points
    2024-04-09T03:56:19.5566667+00:00

    Hi @Sushant Shelake apologies for the delay in response.

    Here's how to properly crawl your PDF document and retrieve relevant answers:

    1. Enable Text Extraction with Blob Indexer:

    • Since you've imported your data, next thing to do is activate the "Enable Text Extraction" option. This instructs the indexer to extract text content from the PDF using Azure Cognitive Services (specifically Text Analytics).

    2. Analyze Text Extraction Settings:

    • In your Blob indexer configuration, review the "Text Extraction" settings. You can specify custom skills or cognitive services for handling specific file formats like PDF.
    • By default, Azure AI Search uses a pre-built skill for text extraction. If your PDFs require advanced processing (e.g., handling complex layouts or tables), consider creating a custom skill using Cognitive Services Text Analytics for more granular control.

    3. Search with Relevant Fields:

    • When formulating your search query, target specific fields extracted from the PDF document. These fields might include extracted text content, metadata, or custom properties defined during indexing.
    • For example, instead of searching the entire document, search for keywords within the extracted text content field: content:"your search term"

    Hope that helps.

    -Grace

    0 comments No comments