SharePoint 2019 Indexing PDF files

reuvygroovy 781 Reputation points
2021-12-30T09:59:53.683+00:00

We are running SP2019 and have some PDF files which for some reason SP doesn't index. Other PDF files it does. I can't figure out why not. How can I got about troubleshooting this?

Microsoft 365 and Office SharePoint Server For business
{count} votes

6 answers

Sort by: Most helpful
  1. Allen Xu_MSFT 13,861 Reputation points
    2021-12-31T05:51:38.817+00:00

    Hi @reuvygroovy ,

    Go to Search service application > Crawl Log > URL View > search by the URLs of those PDF files, could you verfiy if they have been crawled? If not, run a full crawl and re-verify it when completed. Are those PDF files included in the content source which is crawled?

    ----------

    If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


  2. reuvygroovy 781 Reputation points
    2022-01-05T06:01:33.797+00:00

    No. It indexes the files but I still can't search for text within the file.


  3. Enis Abazovic 1 Reputation point
    2022-03-07T12:24:07.697+00:00

    I have the exact same issue. The content (scanned) PDF is crawled, but it can't be found using search in SharePoint 2019.

    0 comments No comments

  4. reuvygroovy 781 Reputation points
    2022-03-07T13:28:06.233+00:00

    To my understanding, there are different formats in which a PDF file can be created, such as PDF/A-1 (https://en.wikipedia.org/wiki/PDF/A)

    Seems to us SharePoint has problems with certain formats when trying to index them

    0 comments No comments

  5. Enis Abazovic 1 Reputation point
    2022-03-07T13:34:38.527+00:00

    There is an article I've found that described this behavior. Basically, PDF created by scan (machine) or other automated solution will put in the PDF:Title field some text that doesn't correspond to the Document name or the file name of the PDF. As it turns out the crawler does pick up that title and overwrites what is found from SharePoint.
    https://www.cloudappie.nl/confusing-titles-and-pdf-files-in-sharepoint-search/

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.