Extracting data from a pdf file - Power Query/{Power BI

Joel Ngige 1 Reputation point
2021-01-24T19:57:31.677+00:00

I have a pdf file relating to data i want to extract into excel using Power Query and perform some analysis. I have tried to use Power Query from native excel and Power BI and am getting the same Error captioned image attached. Kindly note the files are downloaded and saved in my desktop despite the Error caption attached stating the source does not exist.

The file path is located in the website link as text 👉 (https://www.iebc.or.ke/election/?Building_Bridges_Initiative)

Building_Bridges_Initiative_File_1
Building_Bridges_Initiative_File_2

If there is anyone with a solution as to why am getting errors can share a work around on how to extract the data from the PDF. Thanks
59940-data-connection-error.jpg

Not Monitored
Not Monitored
Tag not monitored by Microsoft.
36,388 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ehren (MSFT) 1,781 Reputation points Microsoft Employee
    2021-01-28T23:46:29.91+00:00

    Wow, these PDF files are huge (around 15k pages)! The error is unrelated to PDF per se, but is likely a side effect of the PDF query taking up a lot of memory, due to the large size. Here are a couple suggestions:

    • Are you using the x86 version of PBIDesktop? If so, please try using the x64 version.
    • You can try using the following M to process the pages one at a time, instead of loading the file as a single unit. Just replace the path with the actual location of the file on your machine.

    = Table.Combine(List.Skip(List.Generate(() => [Page=1, NavTable=null], each (if [NavTable] = null then true else Table.RowCount([NavTable]) > 0), each [Page=[Page]+1, NavTable=Pdf.Tables(File.Contents("C:\your_path_here\Building_Bridges_Initiative_File_1.pdf"), [StartPage=[Page], EndPage=[Page]])], each if [NavTable] is null then null else [NavTable]{[Id="Table001"]}[Data]), 1))

    0 comments No comments