PDF actions

Article
10/31/2023

PDF actions enable you to extract images, text, and tables from PDF files, and arrange pages to create new documents.

To extract text from a PDF file, use the Extract text from PDF action. The following example extracts text from a specific range of pages of a password-protected file. The password is specified in the Advanced settings.

To extract texts arranged in a tabular form, enable Optimize for structured data to improve the results' format and accuracy.

Screenshot of the Extract text from PDF action.

To extract tables from a PDF file, deploy the Extract tables from PDF action, select the file, and specify the pages to extract from.

The action produces the ExtractedPDFTables variable that contains a list of PDF table info. To find information about this type of list, go to Advanced data types.

Note

The Extract tables from PDF action doesn't use Optical Character Recognition (OCR), so you can't extract non-copyable text from scanned PDFs.
The library behind the action occasionally extracts additional PDF data that aren't tables. This functionality minimizes the risk of accidentally omitting a real table.

Screenshot of the Extract tables from PDF action.

Apart from extracting information from PDF files, you can create a new PDF document from an existing file using the Extract PDF file pages to new PDF file action.

The following example selects a combination of specific pages and a range of pages.

Screenshot of the Extract PDF file pages to new PDF file action.

Extract text from PDF

You can extract text from a PDF file by using the "Extract text from PDF" action. In the action properties you can define the source PDF file and the pages that text should be extracted from. Under the advanced action properties you can define a password in case the PDF file is protected and if the engine should optimize for structured data or not.

Input parameters

Argument	Optional	Accepts	Default Value	Description
PDF file	No	File		The PDF file to extract text from. Enter a file path, a variable containing a file or a text path
Page(s) to extract	N/A	All, Single, Range	All	Specifies how many pages to extract: All pages, a single page or a range of pages
Single page number	No	Numeric value		The number of the single page to extract text from
From page number	No	Numeric value		The first page number from the range of pages to extract text from
To page number	No	Numeric value		The last page number from the range of pages to extract text from
Password	Yes	Direct encrypted input or Text value		The password of the PDF file. Leave this blank if the PDF isn't password protected
Optimize for structured data	N/A	Boolean value	False	Specify whether to detect formatted layout in the document and extract text accordingly

Variables produced

Argument	Type	Description
ExtractedPDFText	Text value	The extracted text

Exceptions

Exception	Description
PDF file doesn't exist	File doesn't exist on the given path
Invalid password	The given password is invalid
Failed to extract text	Error while trying to extract text

Extract tables from PDF

You can extract tables that are contained in a PDF file by using the Extract tables from PDF action. In the action properties you can define the PDF file and the range of pages that the tables will be extracted from. Under the advanced action properties you can define a password in case a the PDF file is protected, define if the table has headers or not, and finally if tables that cross page margins should be merged or not.

Input parameters

Argument	Optional	Accepts	Default Value	Description
PDF file	No	File		The PDF file to extract tables from. Enter a file path, a variable containing a file or a text path
Page(s) to extract	N/A	All, Single, Range	All	Specifies how many pages to extract tables from: all pages, a single page or a range of pages
Single page number	No	Numeric value		The number of the single page to extract tables from
From page number	No	Numeric value		The first page number from the range of pages to extract tables from
To page number	No	Numeric value		The last page number from the range of pages to extract tables from
Password	Yes	Direct encrypted input or Text value		The password of the PDF file. Leave this blank if the PDF isn't password protected
Merge tables that cross page margins	N/A	Boolean value	True	Specifies whether to merge tables that cross page margins in the specified page range
First line contains column names	N/A	Boolean value	True	Specifies whether the first line of table contains column names

Variables produced

Argument	Type	Description
ExtractedPDFTables	List of PDF table info	The extracted tables with their info as a list

Exceptions

Exception	Description
PDF file doesn't exist	File doesn't exist on the given path
Invalid password	The given password is invalid
Failed to extract tables	Error while trying to extract tables

Extract images from PDF

To extract images from a PDF file you can use the Extract images from PDF action. In the action parameters you can define the PDF file and the pages to extract images from, the naming convention of the extacted images and the target location of the saved images. You can also define a password if the PDF file is protected under the advanced settings.

Input parameters

Argument	Optional	Accepts	Default Value	Description
PDF file	No	File		The PDF file to extract images from. Enter a file path, a variable containing a file or a text path
Password	Yes	Direct encrypted input or Text value		The password of the PDF file. Leave this blank if the PDF isn't password protected
Page(s) to extract	N/A	All, Single, Range	All	Specifies how many pages to extract: All pages, a single page or a range of pages
Single page number	No	Numeric value		The number of the single page to extract images from
From page number	No	Numeric value		The first page number from the range of pages to extract images from
To page number	No	Numeric value		The last page number from the range of pages to extract images from
Image(s) name	No	Text value		How the name of the image(s) starts. Extracted image(s) name example: GivenName_1, GivenName_2
Save image(s) to	No	Folder		The folder to save the extracted images as png files

Variables produced

This action doesn't produce any variables.

Exceptions

Exception	Description
Invalid password	The given password is invalid
Failed to extract images	Indicates that an error occurred while extracting images from the given pages of the PDF
Folder doesn't exist	Indicates that the folder doesn't exist
PDF file doesn't exist	File doesn't exist on the given path

Extract PDF file pages to new PDF file

You can create a new PDF file by extracting pages from an existing PDF file by using the PDF file pages to a new PDF file action. In the action parameters you can define the PDF file to extract the pages from, the page(s) to be extracted, the location of the new PDF file and what should happen if a file with the same name and extension already exists. Finally, under the advanced properties you can define a password in case the source PDF is protected.

Input parameters

Argument	Optional	Accepts	Default Value	Description
PDF file	No	File		The PDF file to extract pages from. Enter a file path, a variable containing a file or a text path
Password	Yes	Direct encrypted input or Text value		The password of the PDF file. Leave this blank if the PDF isn't password protected
Page selection	No	Text value		The index numbers of the pages to keep (for example, 1,3,17-24)
Extracted PDF path	No	File		The path to store the extracted PDF file
If file exists	N/A	Overwrite, Don't overwrite, Add sequential suffix	Add sequential suffix	Specifies what to do in case the output PDF file already exists

Variables produced

Argument	Type	Description
ExtractedPDF	File	The new PDF file

Exceptions

Exception	Description
Invalid password	The given password is invalid
PDF file doesn't exist	File doesn't exist on the given path
Page out of bounds	Indicates that one or more pages are out of bounds of the PDF file
Invalid page selection	Indicates that the given pages aren't valid for the PDF file
Failed to extract new PDF	Indicates that an error occurred while trying to extract new PDF

Merge PDF files

Merges multiple PDF files into a new one.

You can use the Merge PDF files action to take two or more PDF files and merge them into a single file. The files to be merged can be provided either in the form of a list, or enclosed in double quotes and separated by a delimiter. You can also provide passwords for the PDF files, in case they are password-protected.

Input parameters

Argument	Optional	Accepts	Default Value	Description
PDF files	No	List of Files		The files to merge. Enclose multiple files in double quotes (") and separate them by a delimiter, or use a list of files
Merged PDF path	No	File		The path to store the merged PDF
If file exists	N/A	Overwrite, Don't overwrite, Add sequential suffix	Add sequential suffix	Specifies what to do in case the destination file already exists
Passwords	Yes	Direct encrypted input or Text value		The delimited passwords. The order should be the same as the order of the input PDFs. Leave this blank if the PDFs aren't password protected
Delimiter	No	Text value	,	A custom password delimiter. This delimiter shouldn't be part of any of the passwords

Variables produced

Argument	Type	Description
MergedPDF	File	The merged PDF file

Exceptions

Exception	Description
PDF file doesn't exist	File doesn't exist on the given path
Invalid password	The given password is invalid
Failed to merge PDF files	Indicates that an error occurred while merging the files

Share via

PDF actions

Extract text from PDF

Input parameters

Variables produced

Exceptions

Extract tables from PDF

Input parameters

Variables produced

Exceptions

Extract images from PDF

Input parameters

Variables produced

Exceptions

Extract PDF file pages to new PDF file

Input parameters

Variables produced

Exceptions

Merge PDF files

Input parameters

Variables produced

Exceptions

Feedback

Additional resources