Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
In this example, we write a query that retrieves the paragraph nodes from a WordprocessingML document. It also identifies the style of each paragraph.
This query builds on the query in the previous example, Find the default paragraph style, which retrieves the default style from the list of styles. This information is required so that the query can identify the style of paragraphs that don't have a style explicitly set. Paragraph styles are set through the w:pPr
element; if a paragraph doesn't contain this element, it's formatted with the default style.
This article explains the significance of some pieces of the query, then shows the query as part of a complete working example.
Example: Retrieve all the paragraphs in a document, and the paragraph styles
The following code is a query to retrieve all the paragraphs in a document and their styles:
xDoc.Root.Element(w + "body").Descendants(w + "p")
xDoc.Root.<w:body>...<w:p>
This expression is similar to the source of the query in the previous example, Find the default paragraph style. The main difference is that it uses the Descendants axis instead of the Elements axis. The query uses the Descendants axis because in documents that have sections, the paragraphs won't be the direct children of the body element; rather, the paragraphs will be two levels down in the hierarchy. By using the Descendants axis, the code will work whether or not the document uses sections.
Example: Use a let
clause to determine the element that contains the style node
The following query uses a let
clause to determine the element that contains the style node:
let styleNode = para.Elements(w + "pPr").Elements(w + "pStyle").FirstOrDefault()
Let styleNode As XElement = para.<w:pPr>.<w:pStyle>.FirstOrDefault()
The let
clause (Let
in the Visual Basic version) first uses the Elements axis to find all elements named pPr
, then uses the Elements extension method to find all child elements named pStyle
, and finally uses the FirstOrDefault standard query operator to convert the collection to a singleton. If the collection is empty, styleNode
is set to null
(Nothing
in the Visual Basic version). This is a useful idiom to look for the pStyle
descendant node. Note that if the pPr
child node doesn't exist, the code doesn't fail by throwing an exception; instead, styleNode
is set to null
(Nothing
), which is the desired behavior of this let
(Let
) clause.
If there is no element, then styleNode
is set to null
(Nothing
).
The query projects a collection of an anonymous type with two members, StyleName
and ParagraphNode
.
Example: Retrieve the paragraph nodes from a WordprocessingML document
This example processes a WordprocessingML document, retrieving the paragraph nodes. It also identifies the style of each paragraph. This example builds on the previous examples in this tutorial. The new query is called out in comments in the code below.
The instructions for creating the source document for this example are in Create the source Office Open XML document.
This example uses classes found in the WindowsBase assembly. It uses types in the System.IO.Packaging namespace.
const string fileName = "SampleDoc.docx";
const string documentRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
const string stylesRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles";
const string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
XNamespace w = wordmlNamespace;
XDocument xDoc = null;
XDocument styleDoc = null;
using (Package wdPackage = Package.Open(fileName, FileMode.Open, FileAccess.Read))
{
PackageRelationship docPackageRelationship = wdPackage.GetRelationshipsByType(documentRelationshipType).FirstOrDefault();
if (docPackageRelationship != null)
{
Uri documentUri = PackUriHelper.ResolvePartUri(new Uri("/", UriKind.Relative), docPackageRelationship.TargetUri);
PackagePart documentPart = wdPackage.GetPart(documentUri);
// Load the document XML in the part into an XDocument instance.
xDoc = XDocument.Load(XmlReader.Create(documentPart.GetStream()));
// Find the styles part. There will only be one.
PackageRelationship styleRelation = documentPart.GetRelationshipsByType(stylesRelationshipType).FirstOrDefault();
if (styleRelation != null)
{
Uri styleUri = PackUriHelper.ResolvePartUri(documentUri, styleRelation.TargetUri);
PackagePart stylePart = wdPackage.GetPart(styleUri);
// Load the style XML in the part into an XDocument instance.
styleDoc = XDocument.Load(XmlReader.Create(stylePart.GetStream()));
}
}
}
string defaultStyle =
(string)(
from style in styleDoc.Root.Elements(w + "style")
where (string)style.Attribute(w + "type") == "paragraph"&&
(string)style.Attribute(w + "default") == "1"
select style
).First().Attribute(w + "styleId");
// Following is the new query that finds all paragraphs in the
// document and their styles.
var paragraphs =
from para in xDoc
.Root
.Element(w + "body")
.Descendants(w + "p")
let styleNode = para
.Elements(w + "pPr")
.Elements(w + "pStyle")
.FirstOrDefault()
select new
{
ParagraphNode = para,
StyleName = styleNode != null ?
(string)styleNode.Attribute(w + "val") :
defaultStyle
};
foreach (var p in paragraphs)
Console.WriteLine("StyleName:{0}", p.StyleName);
Imports <xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
Module Module1
Private Function GetStyleOfParagraph(ByVal styleNode As XElement, ByVal defaultStyle As String) As String
If (styleNode Is Nothing) Then
Return defaultStyle
Else
Return styleNode.@w:val
End If
End Function
Sub Main()
Dim fileName = "SampleDoc.docx"
Dim documentRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument"
Dim stylesRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles"
Dim wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
Dim xDoc As XDocument = Nothing
Dim styleDoc As XDocument = Nothing
Using wdPackage As Package = Package.Open(fileName, FileMode.Open, FileAccess.Read)
Dim docPackageRelationship As PackageRelationship = wdPackage.GetRelationshipsByType(documentRelationshipType).FirstOrDefault()
If (docPackageRelationship IsNot Nothing) Then
Dim documentUri As Uri = PackUriHelper.ResolvePartUri(New Uri("/", UriKind.Relative), docPackageRelationship.TargetUri)
Dim documentPart As PackagePart = wdPackage.GetPart(documentUri)
' Load the document XML in the part into an XDocument instance.
xDoc = XDocument.Load(XmlReader.Create(documentPart.GetStream()))
' Find the styles part. There will only be one.
Dim styleRelation As PackageRelationship = documentPart.GetRelationshipsByType(stylesRelationshipType).FirstOrDefault()
If (styleRelation IsNot Nothing) Then
Dim styleUri As Uri = PackUriHelper.ResolvePartUri(documentUri, styleRelation.TargetUri)
Dim stylePart As PackagePart = wdPackage.GetPart(styleUri)
' Load the style XML in the part into an XDocument instance.
styleDoc = XDocument.Load(XmlReader.Create(stylePart.GetStream()))
End If
End If
End Using
Dim defaultStyle As String = _
( _
From style In styleDoc.Root.<w:style> _
Where style.@w:type = "paragraph" And _
style.@w:default = "1" _
Select style _
).First().@w:styleId
' Following is the new query that finds all paragraphs in the
' document and their styles.
Dim paragraphs = _
From para In xDoc.Root.<w:body>...<w:p> _
Let styleNode As XElement = para.<w:pPr>.<w:pStyle>.FirstOrDefault() _
Select New With { _
.ParagraphNode = para, _
.StyleName = GetStyleOfParagraph(styleNode, defaultStyle) _
}
For Each p In paragraphs
Console.WriteLine("StyleName:{0}", p.StyleName)
Next
End Sub
End Module
This example produces the following output:
StyleName:Heading1
StyleName:Normal
StyleName:Normal
StyleName:Normal
StyleName:Code
StyleName:Code
StyleName:Code
StyleName:Code
StyleName:Code
StyleName:Code
StyleName:Code
StyleName:Normal
StyleName:Normal
StyleName:Normal
StyleName:Code
The next article in this tutorial shows how to create a query to retrieve the text of paragraphs: