Retrieving the Default Style Name from the Styles Part - VB
[Table of Contents] [Next Topic]
There is a problem in the example presented in the previous topic, which is that it sets the Style property of the anonymous type to null if there is no style on the paragraph. This is incorrect; we should use another query to find the default style in the styles part.
This blog is inactive.
New blog: EricWhite.com/blog
Blog TOCHere is the query to retrieve the default style:
Dim defaultStyle As String = _
CStr(styleDoc.Root _
.Elements(w + "style") _
.Where(Function(style) _
CStr(style.Attribute(w + "type")) = "paragraph" And _
CStr(style.Attribute(w + "default")) = "1") _
.First() _
.Attribute(w + "styleId"))
We can then pass this variable to GetParagraphStyle, so that if there is no style specified for the paragraph, the function returns the default style.
Public Function GetParagraphStyle(ByVal para As XElement, _
ByVal defaultStyle As String) As String
Dim w As XNamespace = _
"https://schemas.openxmlformats.org/wordprocessingml/2006/main"
Dim paraStyle = CStr(para.Elements(w + "pPr") _
.Elements(w + "pStyle") _
.Attributes(w + "val") _
.FirstOrDefault())
If (paraStyle Is Nothing) Then
Return defaultStyle
Else
Return paraStyle
End If
End Function
We can now modify the query to pass the defaultStyle to GetParagraphStyle:
Dim paragraphs = _
mainPartDoc.Root _
.Element(w + "body") _
.Descendants(w + "p") _
.Select(Function(p) _
New With { _
.ParagraphNode = p, _
.Style = GetParagraphStyle(p, defaultStyle) _
} _
)
We can write part of the query that retrieves the default style using a query expression. However, there is no way to express the First call in a query expression, so we must surround the query expression with parentheses, and then dot into the First method:
Dim defaultStyle As String = _
CStr( _
( _
From style in styleDoc.Root _
.Elements(w + "style") _
Where( _
CStr(style.Attribute(w + "type")) = "paragraph" And _
CStr(style.Attribute(w + "default")) = "1") _
) _
.First() _
.Attribute(w + "styleId") _
)
My personal preferred style is to use method syntax in this situation.
One more point about this assignment: because we used the First extension method, the source is iterated, and the value of the variable is set immediately. Unlike the query that finds the paragraphs, which actually does nothing until we iterate through the query using a For Each statement, the First extension method causes the query to execute immediately, and the value of the string defaultStyle variable to be set.
Now, when we run the program, we see:
Heading1 /document/body/p
Normal /document/body/p
Code /document/body/p
Code /document/body/p
Code /document/body/p
Code /document/body/p
Code /document/body/p
Code /document/body/p
Code /document/body/p
Code /document/body/p
Normal /document/body/p
Code /document/body/p
This is what we wanted from this transformation.
The entire listing follows. Note that we had to read the styles part into an XDocument.
Imports System.IO
Imports System.Xml
Imports DocumentFormat.OpenXml.Packaging
Module Module1
<System.Runtime.CompilerServices.Extension()> _
Public Function GetPath(ByVal el As XElement) As String
Return el _
.AncestorsAndSelf _
.InDocumentOrder _
.Aggregate("", Function(seed, i) seed & "/" & i.Name.LocalName)
End Function
Public Function LoadXDocument(ByVal part As OpenXmlPart) _
As XDocument
Using streamReader As StreamReader = New StreamReader(part.GetStream())
Using xmlReader As XmlReader = xmlReader.Create(streamReader)
Return XDocument.Load(xmlReader)
End Using
End Using
End Function
Public Function GetParagraphStyle(ByVal para As XElement, _
ByVal defaultStyle As String) As String
Dim w As XNamespace = _
"https://schemas.openxmlformats.org/wordprocessingml/2006/main"
Dim paraStyle = CStr(para.Elements(w + "pPr") _
.Elements(w + "pStyle") _
.Attributes(w + "val") _
.FirstOrDefault())
If (paraStyle Is Nothing) Then
Return defaultStyle
Else
Return paraStyle
End If
End Function
Sub Main()
Dim w As XNamespace = _
"https://schemas.openxmlformats.org/wordprocessingml/2006/main"
Dim filename As String = "SampleDoc.docx"
Using wordDoc As WordprocessingDocument = _
WordprocessingDocument.Open(filename, True)
Dim mainPart As MainDocumentPart = _
wordDoc.MainDocumentPart
Dim styleDefinitionPart As StyleDefinitionsPart = _
mainPart.StyleDefinitionsPart
Dim commentsPart As WordprocessingCommentsPart = _
mainPart.WordprocessingCommentsPart
Dim mainPartDoc As XDocument = LoadXDocument(mainPart)
Dim styleDoc As XDocument = LoadXDocument(styleDefinitionPart)
Dim commentsDoc As XDocument = LoadXDocument(commentsPart)
Dim defaultStyle As String = _
CStr( _
( _
From style in styleDoc.Root _
.Elements(w + "style") _
Where( _
CStr(style.Attribute(w + "type")) = "paragraph" And _
CStr(style.Attribute(w + "default")) = "1") _
) _
.First() _
.Attribute(w + "styleId") _
)
Dim paragraphs = _
mainPartDoc.Root _
.Element(w + "body") _
.Descendants(w + "p") _
.Select(Function(p) _
New With { _
.ParagraphNode = p, _
.Style = GetParagraphStyle(p, defaultStyle) _
} _
)
For Each p In paragraphs
Console.WriteLine("{0} {1}", p.Style.PadRight(12), _
p.ParagraphNode.GetPath())
Next
End Using
End Sub
End Module
[Table of Contents] [Next Topic] [Blog Map]
Comments
- Anonymous
June 21, 2009
Hi, I need to collect word document para lines with style names. How can i do that? Regards Selva