Is there any good nuget package for PDF text scanner in core ?

chandra dev 1 Reputation point

Hi All,

Currently I am using pdfclown nuget package for scanning the text from pdf file in core project.

My requirement is there to read the pdf text and dump in excel file. pdfclown is doing almost everything's but blank space is not reading from pdf file.


could you please suggest any other alternate nuget package to fulfill this requirement ?

A set of technologies in the .NET Framework for building web applications and XML web services.
4,239 questions
A free and open-source web framework that enables developers to create web apps using C# and HTML being developed by Microsoft.
1,420 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Bruce ( 57,891 Reputation points

    PDF is programing language that draws text and images. the language is a simple stack machine. to help in parsing PDF, tags support was added to help define the document. in postscript the % is the comment character, %% is used to identify a structure tag

    sample hello world:

    /Palatino-Roman 20 selectfont
    300 400 moveto
    (Hello, World!) show

    how well a PDF file can be parsed depends on how well the ps program was written, did it follow tag conventions used by the parser. most likely in your sample, the table is a text array, and only has 2 rows of data.

    note: postscript supports arrays of arrays, so a text table should follow this structure. the data and the code to draw the borders are seperate.

    0 comments No comments

  2. winironteam 6 Reputation points

    we can use IronPDF to extract text From PDF


    or extract text from a PDF file, page by page.

    using PdfDocument PDF = PdfDocument.FromFile("result.pdf");
    for (var index = 0; index < PDF.PageCount; index++)
        int PageNumber = index + 1;
        string Text = PDF.ExtractTextFromPage(index);
    0 comments No comments