How to search OneDrive 365 for PDFs containing text (vs rasterized image of text) for keywords without coding? I can only find coding solutions which are not built-in functions of Office 365 nor Windows 11 and IMHO incurs security and maintenance risk

Sunny PA Wong 1 Reputation point
2024-10-17T14:46:51.6833333+00:00

I need help with listing full pathname of PDF files containing a word.

Get-ChildItem -Path C:\Users\Me -Include *.pdf -File -Recurse -ErrorAction SilentlyContinue | Select-String -pattern “Tax” | group path | select name | Out-File -FilePath .\FindPDFcontainingTax.txt -NoClobber

output listed PDF not containing 'Tax'

Thanks in advance!

Microsoft 365 and Office | OneDrive | For business | Windows
Windows for business | Windows Server | User experience | PowerShell
0 comments No comments
{count} votes

4 answers

Sort by: Most helpful
  1. MotoX80 36,901 Reputation points
    2024-10-19T13:12:41.5433333+00:00

    Your main problem is that PDF files do not store contents in plain text. Your script should work for .TXT files.

    For PDF files you will need to use something that can "read" the PDF contents.

    https://superuser.com/questions/1278479/search-pdf-contents-with-powershell-and-output-a-file-list


  2. Sunny PA Wong 1 Reputation point
    2024-10-23T13:27:49.4733333+00:00

    $searchstring = "Tax"

    $directory = Get-ChildItem -Path C:\Users\Sunny\ -Recurse -ErrorAction SilentlyContinue

    foreach ($obj in $directory)

    {Get-Content $obj.fullname | Where-Object {$_.Contains($s

    OUTPUTS AS FOLLOWS

    Get-Content : Access to the path denied


  3. MotoX80 36,901 Reputation points
    2024-10-28T12:42:48.7366667+00:00

    , why isn't there a non-coding solution?

    Just use Windows search then.

    Verify that you are indexing file contents.

    User's image

    User's image

    User's image

    https://learn.microsoft.com/en-us/windows/win32/lwef/-search-2x-wds-aqsreference

    0 comments No comments

  4. Sunny PA Wong 1 Reputation point
    2024-10-29T05:03:36.2966667+00:00

    User's image Thanks but why doesn't all PDF get highlighted as shown in screenshot? these PDFs in my case contains text not rasterized image of text because they can be directly searched by opening individually without text scan or OCR and therefore not image type.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.