SharePoint 2019 Indexing PDF files

Question

SharePoint 2019 Indexing PDF files

reuvygroovy 781

We are running SP2019 and have some PDF files which for some reason SP doesn't index. Other PDF files it does. I can't figure out why not. How can I got about troubleshooting this?

Vienneau, Patrick (SNB) 1 Reputation point

2022-10-11T17:45:09.977+00:00

Hello! I am also currently experiencing this issue in SharePoint Server 2019. Just wondering if there was a solution found?

6 answers

Your answer

Vienneau, Patrick (SNB) 1 Reputation point

2022-10-11T17:45:09.977+00:00

Hello! I am also currently experiencing this issue in SharePoint Server 2019. Just wondering if there was a solution found?

Answer 1

Allen Xu_MSFT 13,861

Hi @reuvygroovy ,

Go to Search service application > Crawl Log > URL View > search by the URLs of those PDF files, could you verfiy if they have been crawled? If not, run a full crawl and re-verify it when completed. Are those PDF files included in the content source which is crawled?

----------

If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

reuvygroovy 781 Reputation points

2022-01-02T06:11:06.457+00:00

I see it was indexed ("Crawled") with a green success icon
Allen Xu_MSFT 13,861 Reputation points

2022-01-05T02:01:05.95+00:00

So you can find those PDF files in crawl log? Have you solevd this issue?

Answer 2

reuvygroovy 781

No. It indexes the files but I still can't search for text within the file.

Allen Xu_MSFT 13,861 Reputation points

2022-01-06T06:15:18.927+00:00

Where did you perform the search? In a library or a site?

Answer 3

Enis Abazovic 1

I have the exact same issue. The content (scanned) PDF is crawled, but it can't be found using search in SharePoint 2019.

Answer 4

reuvygroovy 781

To my understanding, there are different formats in which a PDF file can be created, such as PDF/A-1 (https://en.wikipedia.org/wiki/PDF/A)

Seems to us SharePoint has problems with certain formats when trying to index them

Answer 5

Enis Abazovic 1

There is an article I've found that described this behavior. Basically, PDF created by scan (machine) or other automated solution will put in the PDF:Title field some text that doesn't correspond to the Document name or the file name of the PDF. As it turns out the crawler does pick up that title and overwrites what is found from SharePoint.
https://www.cloudappie.nl/confusing-titles-and-pdf-files-in-sharepoint-search/

Share via

SharePoint 2019 Indexing PDF files

6 answers

Your answer