question

29309600 avatar image
0 Votes"
29309600 asked JackJJun-MSFT commented

Read data from searchable pdf

Hello, I have a searchable pdf. I want to read certain data from it e.g. (Name: Verona, City: Amsterdam).
How can I do it in C# without having to use an expensive library?
Thanks for the help!

dotnet-csharp
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@29309600, Is any update? I think the following answers are very good. Do you still have questions about it?

0 Votes 0 ·

1 Answer

Castorix31 avatar image
0 Votes"
Castorix31 answered

If you want to extract text from PDF, you can use itext7

A basic sample :

             PdfReader pdfReader = new PdfReader("e:\\test.pdf");
             PdfDocument pdfDoc = new PdfDocument(pdfReader);
             for (int nPage = 1; nPage <= pdfDoc.GetNumberOfPages(); nPage++)
             {
                 iText.Kernel.Pdf.Canvas.Parser.Listener.ITextExtractionStrategy extractionStrategy = new iText.Kernel.Pdf.Canvas.Parser.Listener.SimpleTextExtractionStrategy();
                 string sPageText = iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(nPage), extractionStrategy);
                 Console.WriteLine(string.Format("Page : {0}", nPage.ToString()));
                 Console.WriteLine(sPageText);
             }
             pdfDoc.Close();
             pdfReader.Close();


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.