Issue with OCR Skill Not Processing Images Within PDF Files in Azure Cognitive Search

SuMyat Hlaing 0 Reputation points
2024-07-10T04:29:05.5+00:00

I'm encountering an issue where the OCR skill in Azure Cognitive Search is not processing images contained within PDF files. The OCR skill works for standalone image files (e.g., PNG, JPG) but does not seem to extract text from images embedded in PDF documents.code.txt
I attached file for function code, skillset, indexer, index.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
865 questions
{count} votes

2 answers

Sort by: Most helpful
  1. SuMyat Hlaing 0 Reputation points
    2024-07-11T02:18:01.56+00:00

    @SnehaAgrawal-MSFT Thank you for the detailed explanation.

    I manually ran the indexer in Azure, and it processed the PDFs with embedded images correctly. However, I'm having issues with my Azure Function trigger code. It works for other document types (e.g., standalone images, DOCX, CSV) but not for images within PDFs. Here’s an overview of my situation:

    The function receives the event, extracts the blob URL and other metadata, and attempts to index the blob.

    The function identifies the blob type (PDF, image, etc.) and processes it accordingly.

    While it correctly handles standalone images and other document types, it fails to process images embedded in PDFs.

    I attached the main part of my trigger function code that handles the indexing (I can not write code here because of Code of Conduct)functionapp trigger code.txt

    0 comments No comments

  2. SnehaAgrawal-MSFT 20,856 Reputation points
    2024-07-15T05:36:40.2433333+00:00

    @SuMyat Hlaing Thanks for reply! It seems you are not using built-in indexers but creating your own code, so we're not able to address this question since it is related to OCR expectations and not an AI Search functionality -

    I suggest you use built-in indexers with OCR to address your requirement vs. your own code:

    Azure blob indexer - Azure AI Search | Microsoft Learn with OCR skill - Azure AI Search | Microsoft Learn.  

    Let us know.

    0 comments No comments