Read data from searchable pdf

Wojciech Kusch 1 Reputation point
2021-10-29T20:58:38.253+00:00

Hello, I have a searchable pdf. I want to read certain data from it e.g. (Name: Verona, City: Amsterdam).
How can I do it in C# without having to use an expensive library?
Thanks for the help!

C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
10,233 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Castorix31 81,636 Reputation points
    2021-10-29T23:58:54.177+00:00

    If you want to extract text from PDF, you can use itext7

    A basic sample :

                PdfReader pdfReader = new PdfReader("e:\\test.pdf");
                PdfDocument pdfDoc = new PdfDocument(pdfReader);
                for (int nPage = 1; nPage <= pdfDoc.GetNumberOfPages(); nPage++)
                {
                    iText.Kernel.Pdf.Canvas.Parser.Listener.ITextExtractionStrategy extractionStrategy = new iText.Kernel.Pdf.Canvas.Parser.Listener.SimpleTextExtractionStrategy();
                    string sPageText = iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(nPage), extractionStrategy);
                    Console.WriteLine(string.Format("Page : {0}", nPage.ToString()));
                    Console.WriteLine(sPageText);
                }
                pdfDoc.Close();
                pdfReader.Close();
    
    0 comments No comments