Issue with DocumentFormat.OpenXml reading docX file

Dhananjay Siwach 21

Hi ,

I am using DocumentFormat.OpenXml for reading content from .docX file in asp.net c#.
I have issue with paragraph.InnerText it is given " TOC \o \"1-2\" \h \z \u 1.Introduction PAGEREF _Toc294041589 \h 4" but I need only content without heading. how I can achieve it.

My Code

Package wordPackage = Package.Open(filePath, FileMode.Open, FileAccess.Read);
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(wordPackage))
{
StringBuilder stringBuilder = new StringBuilder();

                IEnumerable<Paragraph> paragraphs = wordDocument.MainDocumentPart.Document.Body.Elements<Paragraph>();

                foreach (var paragraph in paragraphs)
                {
                    Console.WriteLine(paragraph.InnerText);
                    stringBuilder.Append(paragraph.InnerText + "\r\n");
                }
                string content = stringBuilder.ToString();
            }

Yijing Sun-MSFT 7,076 Reputation points

2021-05-03T08:29:53.743+00:00

Hi @Dhananjay Siwach ,
What's your heading?"Introduction PAGEREF _Toc294041589"? Each paragraph has heading? Could you tell us more details to us?
Best regards,
Yijing Sun
Dhananjay Siwach 21 Reputation points

2021-05-03T10:00:51.523+00:00

Hi @Yijing Sun-MSFT ,

I have many .docX files which is have content with heading. I have attach heading screenshot.
I have try many code but I am not able to find only Heading text with space or tab.
Example : I am using following code to getting heading content but I am not getting heading content with space or tab
IEnumerable<Paragraph> paragraphs = wordDocument.MainDocumentPart.Document.Body.Elements<Paragraph>();
foreach (var paragraph in paragraphs)
var paragraphText = paragraph.Descendants<DocumentFormat.OpenXml.Wordprocessing.Text>();
text += txt.Text;
if (!string.IsNullOrEmpty(txt.Space) && txt.Space == SpaceProcessingModeValues.Preserve)
text += " ";
Yijing Sun-MSFT 7,076 Reputation points

2021-05-04T07:04:27.53+00:00
Hi @Dhananjay Siwach ,
As far as I think,you could use run() method.

new Paragraph(new Run(new Text(para.InnerText)))

https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.paragraphproperties?view=openxml-2.8.1

Best regard,
Yijing Sun
Dhananjay Siwach 21 Reputation points

2021-05-04T09:29:11.597+00:00

Hi @Yijing Sun-MSFT ,

I am using your suggested code still I am getting heading with style code. kindly look into it.

1 answer

Alberto Poblacion 1,556 Reputation points

2021-05-01T18:22:20.337+00:00

If you use InnerText, it concatenates all the texts of all the xml elements that make up the paragraph internally. That's not what you want.
Instead, you need to enumerate all the Run elements of the Paragraph and for each Run, each of the Text elements, and then you take the text from there.

To enumerate the Runs, you would use a loop similar to this:
foreach (var run in paragraph.Elements<Run>())

And a similar loop would enumerate the run.Elements<Text> to get all the texts.

For more info, explore the documentation starting here for the Run.
Please sign in to rate this answer.
Dhananjay Siwach 21 Reputation points

2021-05-03T10:06:29.79+00:00

Hi @Alberto Poblacion ,

This code not working for Heading content. not getting docX heading content.

Alberto Poblacion 1,556 Reputation points

2021-05-03T11:52:09.903+00:00

Use the debugger. Place a breakpoint just after the code obtains a paragraph, and then expand the properties in the debugger and dig into them until you find which is the property that contains the information that you are looking for. Then, use the value of such properties in your code.

Dhananjay Siwach 21 Reputation points

2021-05-04T05:54:41.423+00:00

Hi @Alberto Poblacion ,

There is no property which have a content for Heading.

Yijing Sun-MSFT 7,076 Reputation points

2021-05-05T08:34:54.543+00:00

Hi @Dhananjay Siwach ,
i suggest you could check the value step by step each lines. And do you have errors?
Best regards,
Yijing Sun
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

Issue with DocumentFormat.OpenXml reading docX file

1 answer

Your answer