
Hi @UN , Welcome to Microsoft Q&A.
I did this test by modifying the .docx file into a zip file, and then decompressing it to get the document.xml file.
Get the <w:body>
node by selecting the //w:body
path using the SelectSingleNode()
method. Then, we use the SelectNodes()
method to select all <w:p>
nodes under the bodyNode, which represent paragraphs. Then, we iterate through each paragraph node, get the text content of its InnerText
property, and use spaces, tabs and newlines to split the string, count the number of words in each paragraph, and add them to the total number of words wordCount
.
// See https://aka.ms/new-console-template for more information
using System.Xml;
string xmlFilePath = @"C:\Users\Administrator\Desktop\wordCount\word\document.xml";
int wordCount = GetWordCountFromXml(xmlFilePath);
Console.WriteLine($"Word count: {wordCount}");
static int GetWordCountFromXml(string xmlFilePath)
{
int wordCount = 0;
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(xmlFilePath);
XmlNamespaceManager namespaceManager = new XmlNamespaceManager(xmlDoc.NameTable);
namespaceManager.AddNamespace("w", "http://schemas.openxmlformats.org/wordprocessingml/2006/main");
XmlNode bodyNode = xmlDoc.SelectSingleNode("//w:body", namespaceManager);
if (bodyNode != null)
{
XmlNodeList paragraphNodes = bodyNode.SelectNodes(".//w:p", namespaceManager);
foreach (XmlNode paragraphNode in paragraphNodes)
{
string paragraphText = paragraphNode.InnerText.Trim();
string[] words = paragraphText.Split(new char[] { ' ', '\t', '\n' }, StringSplitOptions.RemoveEmptyEntries);
wordCount += words.Length;
}
}
return wordCount;
}
Best Regards,
Jiale
If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.