Extracting data from a pdf file - Power Query/{Power BI

Joel Ngige 1 Reputation point

I have a pdf file relating to data i want to extract into excel using Power Query and perform some analysis. I have tried to use Power Query from native excel and Power BI and am getting the same Error captioned image attached. Kindly note the files are downloaded and saved in my desktop despite the Error caption attached stating the source does not exist.

The file path is located in the website link as text 👉 (https://www.iebc.or.ke/election/?Building_Bridges_Initiative)


If there is anyone with a solution as to why am getting errors can share a work around on how to extract the data from the PDF. Thanks

Not Monitored
Not Monitored
Tag not monitored by Microsoft.
36,388 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ehren (MSFT) 1,781 Reputation points Microsoft Employee

    Wow, these PDF files are huge (around 15k pages)! The error is unrelated to PDF per se, but is likely a side effect of the PDF query taking up a lot of memory, due to the large size. Here are a couple suggestions:

    • Are you using the x86 version of PBIDesktop? If so, please try using the x64 version.
    • You can try using the following M to process the pages one at a time, instead of loading the file as a single unit. Just replace the path with the actual location of the file on your machine.

    = Table.Combine(List.Skip(List.Generate(() => [Page=1, NavTable=null], each (if [NavTable] = null then true else Table.RowCount([NavTable]) > 0), each [Page=[Page]+1, NavTable=Pdf.Tables(File.Contents("C:\your_path_here\Building_Bridges_Initiative_File_1.pdf"), [StartPage=[Page], EndPage=[Page]])], each if [NavTable] is null then null else [NavTable]{[Id="Table001"]}[Data]), 1))

    0 comments No comments