The new LINQ to XML “Bridge Classes” to System.Xml

2007-03-04

[updated to try out a Live Writer plugin for code formatting ... let's see if that makes the code samples more readable!]

In a previous post we presented an overview of the XML Features in the "Orcas" Community Technology Preview. This post gives some more details and examples for what we call the "bridge classes" that let one use other System.Xml APIs over a LINQ to XML tree. For example, LINQ to XML users can now create a tree in memory with an XmlWriter application or as the output of an XSLT transformation, validate a loaded tree against an XSD schema, use XPath 1.0 to query and XSLT 1.0 to transform the tree, and so forth.

Why do this? Can't a developer already do anything with a LINQ query that can be done with XPath? Doesn't the combination of LINQ queries and functional construction do more or less what XSLT does? In principle, sure. In practice, there are some good reasons to build a bridge to the rest of System.Xml:

We know that people aren't going to rewrite all their applications that use System.Xml just because LINQ to XML is here, yet we do want to make it as easy as possible to migrate incrementally and reuse proven codebases. For example, some of our customers have invested heavily in XSLT as a means of publishing data, but may want to rebuild the front end data integration layer to exploit the capabilities of LINQ.
Some of these System.Xml libraries, especially the XSD validator, are tricky bits of code that we don't want to rewrite and maintain in parallel. We want to re-use our proven codebases!
Sometimes XPath and/or XSLT are the best tools for the job at hand, irrespective of whether the job could be done with LINQ to XML. XSLT is a great tool for XML data whose structure is only loosely defined or arbitrarily recursive. Likewise, many will prefer LINQ to XPath for queries embedded in code, but XPath's string representation makes it easier to generate queries dynamically, or to evaluate queries generated by other applications.

Using the Bridge Classes

The examples below refer to a sample XML instance that describes the make, model, color, and 4-wheel drive capability of automobiles.

string autoData = @"
<Autos>
<Auto>
<Make>Ford</Make>
<Model>Ranger</Model>
<Color>white</Color>
<FourXFour>false</FourXFour>
</Auto>
<Auto>
<Make>Toyota</Make>
<Model>4-Runner</Model>
<Color>gold</Color>
<FourXFour>true</FourXFour>
</Auto>
<Auto>
<Make>Chevy</Make>
<Model>Cavalier</Model>
<Color>white</Color>
<FourXFour>false</FourXFour>
</Auto>
</Autos>";

These bridge classes exist outside of the core System.Xml.Linq namespace; extension methods supporting this additional functionality are brought into scope by referencing the appropriate namespace.

XPath

XPath is enabled by referencing the System.Xml.XPath namespace:

using System.Xml.XPath;

This brings into scope CreateNavigator overloads to create XpathNavigator objects, XPathEvaluate overloads to evaluate an XPath expression, and XPathSelectElement[s] overloads that work much like SelectSingleNode and XPatheSelectNodes methods in the System.Xml DOM API.

For example, to display cars that have four wheel drive:

XDocument doc1 = XDocument.Parse(autoData);
foreach (var model in doc1
.XPathSelectElements("//Auto[FourXFour='true']"))
Console.WriteLine(model);

To use namespace-qualified XPath expressions, it is necessary to pass in a NamespaceResolver object, just as with DOM.

XSLT

XSLT is enabled by referencing the System.Xml.Xsl namespace

using System.Xml.Xsl;

Let's consider an example where we want to transform the automobile data (which has been loaded into an XDocument object named doc1) with the following stylesheet:

 string xslMarkup = @"
<xsl:stylesheet version='1.0' xmlns:xsl='https://www.w3.org/1999/XSL/Transform'>
    <xsl:template match='//Autos'>
        <html>
            <body>
            <h1>Autos in Stock</h1>
            <table>
                <tr>
                    <th>Make</th>
                    <th>Model</th>
                    <th>Color</th>
                </tr>
                < xsl:apply-templates></xsl:apply-templates>
            </table>
            </body>
        </html>
    </xsl:template>
    <xsl:template match='Auto'>
        <tr>
            <td><xsl:value-of select='Make'/></td>
            <td><xsl:value-of select='Model'/></td>
            <td><xsl:value-of select='Color'/></td>
        </tr>
    </xsl:template>
</xsl:stylesheet>";

You can perform the transformation by a) creating an XDocument object to hold the result; b) loading the stylesheet into an XslCompiledTransform object; c) creating a "bridge class" XmlReader over the input XDocument object and using that as input to the transformation.

 XDocument newTree = new XDocument();
using (XmlWriter writer = newTree.CreateWriter())
    {
    XslCompiledTransform xslTransformer = new XslCompiledTransform();
    xslTransformer.Load(XmlReader.Create(new StringReader(xslMarkup)));
    xslTransformer.Transform(doc1.CreateReader(), writer);
    }
    Console.WriteLine(newTree);

A few things to note:

This example shows a LINQ to XML XDocument tree being transformed into another XDocument object, but the same pattern above would work if the input was an XmlDocument tree or an XPathDocument object. That's the whole point of the bridge classes – to mix-n-match LINQ to XML objects (which we hope are easier to use than XmlDocument objects, and can easily integrate with other LINQ providers) with existing code.
Note that in this example we did not load the XSLT stylesheet into an XDocument tree. That would be possible, e.g. you might want to programmatically generate the stylesheet using LINQ to XML. There is a bug in System.Xml's XSD and XSLT classes that makes this somewhat problematic in this release. (It will be fixed by the time Orcas releases, but it's not clear whether it will be fixed for the forthcoming Beta1 or not). Here's the situation: The System.Xml reader exposes an optimization that returns more information on a piece of text than the XML Infoset specifies, i.e. whether it is pure text, a chunk that only contains whitespace, a piece of "significant whitespace" if the xml:space attribute says to preserve that. LINQ to XML was designed to expose the Infoset as its public object model, however, so the XmlReader's distinctions among text, whitespace, and significant whitespace are thrown away when building an XLinq tree. The current XSLT and XSD implementations are tightly coupled to the XmlReader's model, however, and there are some cases where they throw exceptions when processing stylesheets or schemas expresses as LINQ to XML trees.

XSD Validation

You can validate an XElement tree against an XML schema via extensions method in the System.Xml.Schema namespace. This is exactly the same functionality that was shipped in .NET 2.0, with only a "bridge" to expose the classes in that namespace to LINQ to XML.

To bring this functionality into scope, use:

using System.Xml.Schema;

You can now use the .NET 2.0 classes and methods to populate XmlSchemaObject / XmlSchemaSet objects. There will be methods available to Validate() XElement, XAttribute, or XDocument objects against the schema and optionally populate a post schema validation infoset as annotations on the LINQ to XML tree.

For example, here is a schema for the sample Autos data (generated by the XML Editor in Visual Studio):

 string xsdMarkup =
@"<xs:schema attributeFormDefault='unqualified'
elementFormDefault='qualified' xmlns:xs='https://www.w3.org/2001/XMLSchema'>
<xs:element name='Autos'>
    <xs:complexType>
        <xs:sequence>
            <xs:element maxOccurs='unbounded' name='Auto'>
                 <xs:complexType>
                    <xs:sequence>
                        <xs:element name='Make' type='xs:string' />
                                <xs:element name='Model' type='xs:string' />
                                <xs:element name='Color' type='xs:string' />
                                <xs:element name='FourXFour' type='xs:boolean' />
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
</xs:element>
</xs:schema>";

Load that schema into an XmlSchemaSet, then pass it to the Validate() extension method:

 XmlSchemaSet schemas = new XmlSchemaSet();
schemas.Add("", XmlReader.Create(new StringReader(xsdMarkup)));
bool errors = false;
doc1.Validate(schemas, (sender, e) =>
    {
    Console.WriteLine(e.Message);
    errors = true;
    }, true);
Console.WriteLine("doc1 {0}", errors ? "did not validate" : "validated");
DumpInvalidNodes(doc1.Root);

The code for the DumpInvalidNodes() utility method is:

 static void DumpInvalidNodes(XElement el)
{
    if (el.GetSchemaInfo().Validity != XmlSchemaValidity.Valid)
        Console.WriteLine("Invalid Element {0}",
            el.AncestorsAndSelf()
            .Aggregate("", (s, i) => s + i.Name.ToString() + "/"));
    foreach (XAttribute att in el.Attributes())
        if (att.GetSchemaInfo().Validity != XmlSchemaValidity.Valid)
            Console.WriteLine("Invalid Attribute {0}",
                att.Parent.AncestorsAndSelf()
                .Aggregate("",
                    (s, i) => s + i.Name.ToString() + "/") + att.Name.ToString()
                );
    foreach (XElement child in el.Elements())
        DumpInvalidNodes(child);
}

Caveats

It is important to remember that the bridge classes are intended to be just that - a bridge from LINQ to XML to the rest of System.Xml, not a seamless integration. There are some impedance mismatches that result from conscious design decisions to eliminate some of the aspects of System.Xml that annoy people. These mismatches include:

No over-arching document context - in DOM, element, attribute, etc. objects exist only in the context of a specific XmlDocument
A different namespace model. In LINQ to XML, it is the namespace name (which looks like a URI, but has slightly different semantics) plus the localname that names an element or attribute, and the prefixes are just serialization details. In DOM, the namespace name, localname, and prefix are all carried around.
Using the Infoset model of whitespace and not the "significant whitespace" concept used in the rest of System.Xml.

In the short term, bridging these impedance mismatches consumes processor cycles, so it is not as fast to do a schema validation or XSLT transformation over an XDocument object than it is over XmlDocument or XPathDocument. We're working on that, but we don't recommend using the bridge classes for performance-critical operations.

We would very much like to hear from you about how you put these classes to work, what additional features you might need, to hear about performance issues, etc.

Comments

Anonymous
March 09, 2007
Pingback: http://oakleafblog.blogspot.com/2007/03/linq-to-xml-bridge-classes-to-systemxml.html

Share via