Is there any good nuget package for PDF text scanner in asp.net core ?

chandra dev 1 Reputation point
2022-02-02T13:28:40.573+00:00

Hi All,

Currently I am using pdfclown nuget package for scanning the text from pdf file in asp.net core project.
https://pdfclown.org/

My requirement is there to read the pdf text and dump in excel file. pdfclown is doing almost everything's but blank space is not reading from pdf file.

170574-image.png

could you please suggest any other alternate nuget package to fulfill this requirement ?

ASP.NET Core
ASP.NET Core
A set of technologies in the .NET Framework for building web applications and XML web services.
4,815 questions
Blazor
Blazor
A free and open-source web framework that enables developers to create web apps using C# and HTML being developed by Microsoft.
1,672 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Bruce (SqlWork.com) 74,531 Reputation points
    2022-02-15T18:29:25.627+00:00

    PDF is programing language that draws text and images. the language is a simple stack machine. to help in parsing PDF, tags support was added to help define the document. in postscript the % is the comment character, %% is used to identify a structure tag

    sample hello world:

    %!PS
    /Palatino-Roman 20 selectfont
    300 400 moveto
    (Hello, World!) show
    showpage
    

    how well a PDF file can be parsed depends on how well the ps program was written, did it follow tag conventions used by the parser. most likely in your sample, the table is a text array, and only has 2 rows of data.

    note: postscript supports arrays of arrays, so a text table should follow this structure. the data and the code to draw the borders are seperate.

    0 comments No comments

  2. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.