
What's the fastest way to read a .docx file line-by-line in c# using openxml

Hi all,
I need to read this Word (.DOC and .DOCX) file line by line using OpenXML.
In this code I have set an regexp is to check if a line starts with a whitespace, a letter, the bullet character or the - character or number
protected void Page_Load(object sender, EventArgs e)
{
if (!this.IsPostBack)
{
string file = @"C:\Users\Downloads\qst.docx";
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(file, true))
{
Body body = wordDoc.MainDocumentPart.Document.Body;
string contents = "";
var reg = new Regex(@"^[\s\p{L}\d\•\-\►]");
foreach (Paragraph co in
wordDoc.MainDocumentPart.Document.Body.Descendants<Paragraph>().Where<Paragraph>(somethingElse =>
reg.IsMatch(somethingElse.InnerText)))
{
if (co.ParagraphProperties != null || co.ParagraphProperties.NumberingProperties != null)
{
contents += co.InnerText + "<br />";
}
else
{
// Do other checking.
}
}
Response.Write(contents);
}
}
}
Using this code the return in browser is wrong, because the bulleted and numbered lists of the word file are not displayed...
Section 1
- Para 1.1
Content 1.1
test 2
test 3
•Gaio Giulio Cesare
•Quinto Orazio Flacco
•Marco Porcio Catone
Section 2
- Para 2.1
Content 2.1
test 4
test 5 - Gaio Giulio Cesare
- Quinto Orazio Flacco
- Marco Porcio Catone
► Marco Porcio Catone
► Quinto Orazio Flacco
► Gaio Giulio Cesare
Microsoft 365 and Office | Word | For business | Windows

Developer technologies | C#
1 answer
Sort by: Most helpful
-
Deleted
This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
Comments have been turned off. Learn more