Item | Description |
---|---|
Release State | General Availability |
Products | Excel Power BI (Semantic models) Power BI (Dataflows) Fabric (Dataflow Gen2) Power Apps (Dataflows) Dynamics 365 Customer Insights |
Authentication Types Supported | Anonymous (online) Basic (online) Organizational account (online) Windows (online) |
Function Reference Documentation | Pdf.Tables |
Note
Some capabilities may be present in one product but not others due to deployment schedules and host-specific capabilities.
Note
PDF isn't supported in Power BI Premium.
None.
- Import
To make the connection from Power Query Desktop:
Select the PDF option in the connector selection.
Browse for and select the PDF file you want to load. Then select Open.
If the PDF file is online, use the Web connector to connect to the file.
In Navigator, select the file information you want, then either select Load to load the data or Transform Data to continue transforming the data in Power Query Editor.
To make the connection from Power Query Online:
Select the PDF option in the connector selection.
In the PDF dialog box that appears, either provide the file path or the URL to the location of the PDF file. If you're loading a local file, you can also select Upload file (Preview) to browse to the local file or drag and drop the file.
If necessary, select an on-premises data gateway to access the PDF file.
If this is the first time you've accessed this PDF file, select the authentication kind and sign in to your account (if needed).
In Navigator, select the file information you want, and then select Transform Data to continue transforming the data in Power Query Editor.
You can use the following strategies to improve performance and reduce timeouts when you access large PDF files. These strategies require that you edit your usage of the Pdf.Tables function in either the formula bar or advanced editor.
- Try selecting pages one at a time or one small range at a time using the
StartPage
orEndPage
options, iterating over the entire document as needed. - If the PDF document is one single, huge table, the
MultiPageTables
option can be collecting very large intermediate values, so disabling it might help.
A full list of available options can be found in Pdf.Tables.
In cases where multi-line rows aren't properly identified, you might be able to clean up the data using UI operations or custom M code. For example, you could copy misaligned data to adjacent rows using Table.FillDown, or group and combine adjacent rows using Table.Group.
When working with the PDF connector on dataflows in a Premium capacity, the PDF connector doesn't properly make the connection. To enable the PDF connector to work on dataflows in a Premium capacity, configure that dataflow to use a gateway, and confirm the connection to that dataflow goes through the gateway.