Word, Very slow reading of paragraphs on some pages

Central data 156 Reputation points
2021-11-13T07:22:52.2+00:00

I have this document:https://mega.nz/file/EgIXQCCS#V3SjNh9-H32MCjMqQvRIGjXfj6qxlrPmLdwdIQcAxWQ

I want to read all the paragraphs of the document and I have a loop where I iterate to read the paragraphs and when it reaches the pages 2552-2620 / 2695-2954 slows down dramatically and can take days to read all the paragraphs in the file.

I have been able to know how to retrieve the paragraphs of a specific page, to skip the number of paragraphs that it contains, but the paragraphs on the page do not match the paragraphs in the document's paragraph loop.

The content of those pages does not interest me because they are tables. If possible, I would like to skip those pages. any solution?

Application application = new Application();
Document document = application.Documents.Open("C:\\word.doc");

foreach (Microsoft.Office.Interop.Word.Paragraph MyParagraph in document.Paragraphs)
{   int  Page = MyParagraph.Range.Information[Microsoft.Office.Interop.Word.WdInformation.wdActiveEndPageNumber];

    if ((Page >= 2552 && Page <= 2620) || (Page >= 2695 && Page <= 2954))
    {    
    }
 }
application.Quit();

As soon as the foreach loop reaches paragraphs that are on those pages, it practically stops. And they are hundreds of pages.

Microsoft 365 and Office | Word | For business | Windows
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. John Korchok 6,771 Reputation points Volunteer Moderator
    2021-11-13T16:32:52.627+00:00

    Pages are not the most reliable way to access Word text. They are not Word objects, but an on-the-fly display trick that Word does.

    As an alternative, I would check if MyParagraph.Range.Information has an option for whether it is in a table, then skip over it, if it is.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.