What's the fastest way to read a .docx file line-by-line in c# using openxml

Uncle Vince 41 Reputation points
2021-04-01T09:17:51.4+00:00

Hi all,

I need to read this Word (.DOC and .DOCX) file line by line using OpenXML.

83529-immagine.png

In this code I have set an regexp is to check if a line starts with a whitespace, a letter, the bullet character or the - character or number

protected void Page_Load(object sender, EventArgs e)  
{  
    if (!this.IsPostBack)  
    {  
  
        string file = @"C:\Users\Downloads\qst.docx";  
  
        using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(file, true))  
        {  
            Body body = wordDoc.MainDocumentPart.Document.Body;  
            string contents = "";  
  
            var reg = new Regex(@"^[\s\p{L}\d\•\-\►]");  
  
            foreach (Paragraph co in  
                        wordDoc.MainDocumentPart.Document.Body.Descendants<Paragraph>().Where<Paragraph>(somethingElse =>  
                        reg.IsMatch(somethingElse.InnerText)))  
            {  
                if (co.ParagraphProperties != null || co.ParagraphProperties.NumberingProperties != null)  
                {  
                    contents += co.InnerText + "<br />";  
                }  
                else  
                {  
                    // Do other checking.  
                }  
            }  
  
            Response.Write(contents);  
        }  
    }  
}  

Using this code the return in browser is wrong, because the bulleted and numbered lists of the word file are not displayed...

Section 1

  • Para 1.1
    Content 1.1
    test 2
    test 3

•Gaio Giulio Cesare
•Quinto Orazio Flacco
•Marco Porcio Catone

Section 2

  • Para 2.1
    Content 2.1
    test 4
    test 5
  • Gaio Giulio Cesare
  • Quinto Orazio Flacco
  • Marco Porcio Catone

► Marco Porcio Catone
► Quinto Orazio Flacco
► Gaio Giulio Cesare

Microsoft 365 and Office | Word | For business | Windows
Developer technologies | C#
{count} votes

1 answer

Sort by: Most helpful
  1. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.