Extracting data from a pdf file - Power Query/{Power BI

Question

Extracting data from a pdf file - Power Query/{Power BI

Joel Ngige 1

I have a pdf file relating to data i want to extract into excel using Power Query and perform some analysis. I have tried to use Power Query from native excel and Power BI and am getting the same Error captioned image attached. Kindly note the files are downloaded and saved in my desktop despite the Error caption attached stating the source does not exist.

The file path is located in the website link as text 👉 (https://www.iebc.or.ke/election/?Building_Bridges_Initiative)

Building_Bridges_Initiative_File_1
Building_Bridges_Initiative_File_2

If there is anyone with a solution as to why am getting errors can share a work around on how to extract the data from the PDF. Thanks

Ehren (MSFT) 1,781 Reputation points Microsoft Employee

2021-01-27T23:44:42.033+00:00

We'll take a look at this. In the meantime, can you try creating a blank query and entering the following?

= Pdf.Tables(File.Contents("c:\the file name.pdf"))

1 answer

Your answer

Ehren (MSFT) 1,781 Reputation points Microsoft Employee

2021-01-27T23:44:42.033+00:00

We'll take a look at this. In the meantime, can you try creating a blank query and entering the following?

= Pdf.Tables(File.Contents("c:\the file name.pdf"))

Answer 1

Wow, these PDF files are huge (around 15k pages)! The error is unrelated to PDF per se, but is likely a side effect of the PDF query taking up a lot of memory, due to the large size. Here are a couple suggestions:

Are you using the x86 version of PBIDesktop? If so, please try using the x64 version.
You can try using the following M to process the pages one at a time, instead of loading the file as a single unit. Just replace the path with the actual location of the file on your machine.

= Table.Combine(List.Skip(List.Generate(() => [Page=1, NavTable=null], each (if [NavTable] = null then true else Table.RowCount([NavTable]) > 0), each [Page=[Page]+1, NavTable=Pdf.Tables(File.Contents("C:\your_path_here\Building_Bridges_Initiative_File_1.pdf"), [StartPage=[Page], EndPage=[Page]])], each if [NavTable] is null then null else [NavTable]{[Id="Table001"]}[Data]), 1))

Share via

Extracting data from a pdf file - Power Query/{Power BI

1 answer

Your answer