Issue with OCR Search Functionality in PDF Files using Microsoft Syntex

Prashanti Prakash Sao 0 Reputation points
2025-06-11T10:56:28.1466667+00:00

Hi,

We are looking to implement functionality that allows us to search the content of PDF files, including those that are scanned or contain images.

To achieve this, I have configured Microsoft Syntex OCR for a SharePoint site. When I upload an image file, the text is successfully extracted into a column, and I can search for it as expected. However, when I upload a PDF file—whether it contains images or mixed content—the search does not return any results.

According to the Microsoft documentation, PDFs are supported, and it is mentioned that “When you apply OCR to a PDF or TIFF file, the extracted text is indexed in search but not available in the metadata column.”

Has anyone else encountered a similar issue?

Additionally, if there are any alternative approaches to make PDF (with images) content searchable using only Microsoft 365 tools (no third-party solutions)

Thanks!

Community Center Not monitored
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Steven-N 1,995 Reputation points Microsoft External Staff Moderator
    2025-06-11T14:58:50.73+00:00

    Hi Prashanti Prakash Sao

    Thank you for reaching out to the Microsoft Q&A Forum, and I apologize for the inconvenience you're experiencing.  

    Based on your description, it appears the Microsoft Syntex OCR feature is not functioning as expected with PDF files. As a forum moderator, I don't have access to your internal system configuration or data, so I'm unable to provide a precise answer for your specific situation.

    However, you can try the following workaround methods to see if they resolve the issue. **

    1. Check Content Quality (OCR Readability)** 

    Link Article: Overview of optical character recognition in SharePoint - Microsoft Syntex | Microsoft Learn 

    Verify files are under 50 MB, within pixel limits (50x50 to 16,000x16,000) and were uploaded after the Syntex model was applied to the library. 

    Use high-quality scans of at least 300 DPI and check for issues like page skew, shadows, or digital "noise" that can block OCR. 

    As a test, upload a fresh, high-quality "clean" file after confirming the settings to ensure it gets processed correctly. 

    2. Re-setup the OCR in SharePoint  

    You could try setting up the OCR feature in SharePoint again to see if that resolves the issue. In some cases, the feature might encounter an error and need to be reset. You can follow the instructions in the link below:  

    Link instruction: Set up and manage optical character recognition in SharePoint - Microsoft Syntex | Microsoft Learn 

    If the issue persists after this period and after trying the above steps, please let us know. 

    Hope you will solve the problem soon 


    If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".    

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread. 


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.