How to find pdf property - page count in SharePoint online using PnP powershell?

H Risbud 251 Reputation points
2023-06-13T00:59:12.19+00:00

Please help me to extract following file properties from SharePoint online document library using PnP PowerShell? File types could be word, excel, powerpoint or pdf. I'm not able to extract No of pages of PDF document. Can we extract PDF properties using -comobject?

  • File Name
  • File URL
  • No of pages <--
  • Created By
  • Created (Date)
Microsoft 365 and Office | SharePoint | For business | Windows
Windows for business | Windows Server | User experience | PowerShell
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. Rich Matheisen 47,901 Reputation points
    2023-06-13T02:36:38.4333333+00:00

    It's probably easier to download the command line tools ("Download the Xpdf command line tools:") from this link: https://www.xpdfreader.com/download.html

    Use the "pdfinfo.exe" file. Something like this:

    cd "C:\Program Files\Glyph & Cog"	# not necessary if you place the tools in a PATH value
    
    $file = "Full Path Name Goes Here"
    $pages = (.\pdfinfo.exe $file |
                Select-String -Pattern '(?<=Pages:\s*)\d+').Matches.Value
    
    
    0 comments No comments

  2. Anonymous
    2023-06-13T07:14:44.2933333+00:00

    Hi @H Risbud,

    To extract file properties from a SharePoint Online document library using PnP PowerShell, you can use the "Get-PnPListItem.

    Here's an example of how to extract the file attributes you mentioned:

    # Connect to SharePoint Online
    Connect-PnPOnline -Url "https://your-domain.sharepoint.com/sites/your-site"
    
    # Get all files from the document library
    $files = Get-PnPListItem -List "Documents"
    
    # Iterate through each file and retrieve the properties
    foreach ($file in $files) {
        $fileName = $file.FieldValues["FileLeafRef"]
        $fileUrl = $file.FieldValues["FileRef"]
        $createdBy = $file.FieldValues["Author"]
        $createdDate = $file.FieldValues["Created"]
    
        # Output the file properties
        Write-Output "File Name: $fileName"
        Write-Output "File URL: $fileUrl"
        Write-Output "Created By: $createdBy"
        Write-Output "Created Date: $createdDate"
    
       
    }
    
    # Disconnect from SharePoint Online
    Disconnect-PnPOnline
    
    

    Here is result:

    User's image

    User's image

    Note:

    But for the page number of PDF, it belongs to some functions provided by the third party (Adobe Acrobat) to process the page number of PDF file.

    Sorry, we cannot provide relevant test functions and services here, please understand

    Regarding this content, I can provide relevant ideas to help you. To use COM objects to process PDF files in Windows, you need to install an appropriate PDF processing library and register it on the computer.

    A commonly used PDF processing library is Adobe Acrobat.

    Visit the official Adobe website, and download and install the latest version of Adobe Acrobat software.

    During the installation process, make sure to select related components such as "Adobe PDF iFilter" and "Adobe PDF Library Files" for installation.

    After the installation is complete, open Command Prompt (CMD) to run as administrator.

    Navigate to the "bin" folder of your Adobe Acrobat installation directory. For example, if installed in the default location, the path might be: "C:\Program Files (x86)\Adobe\Acrobat DC\Acrobat".

    5

    In a command prompt, run the following command to register the COM component:

    regsvr32.exe "C:\Program Files (x86)\Adobe\Acrobat DC\Acrobat\Acrobat.tlb"
    

    Please adjust the path in the command according to the actual installation path.

    After completing the above steps, you should be able to use COM objects in PowerShell scripts to manipulate PDF files, including operations such as getting page numbers.


    If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    Best Regards

    Cheng Feng

    0 comments No comments

  3. Paul de Jong 806 Reputation points
    2023-06-14T07:42:12.79+00:00

    I do not have experience with using comobjects.

    There are apps that offer this capability (i.e. extract properties from SharePoint documents like pdf, docx, xslx, ...). For example, see here.

    These tools are typically designed for handling large volumes of documents (100000's+). If you only need to process a few 100 documents these tools are overkill.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.