XML Programming in Visual Basic 9.0
By now you've probably heard of LINQ (or Language Integrated Query), the new query technology coming in Visual Studio® 2008. LINQ-enabled languages like Visual Basic® give you a rich set of query operators that can be applied to various data sources, such as in-memory collections, databases, datasets, and XML. That alone is pretty cool, but Visual Basic 9.0 actually goes beyond that and makes XML a first-class data type directly in the language.
Now you may be wondering why you would want an XML data type integrated directly into Visual Basic. Today many applications use XML for both storage and data transfer. XML has gained broad adoption across the industry due to its flexibility and simplicity, and it is used in many apps for both storage and data transfer. It works particularly well for transferring data among systems since it is self-describing (meaning that the structure of the data is included with the data). Plus, it's much easier to read data structured inside XML tags than it is to write parsing logic for various custom file formats.
The problem with XML, however, is that it has never been particularly easy for developers to work with. Awkward and inconsistent APIs, such as the Document Object Model (DOM), and languages such as XSLT and XQuery lead to writing a lot of tedious code that is often difficult to read and understand. But with the introduction of LINQ and Visual Basic 9.0, XML development becomes much easier. In this column I will explore the current XML programming experience, how LINQ improves the experience, and how Visual Basic provides even more support when working with XML.
Using the DOM
To get started, say I need to write out a list of customers in XML. My customer list has the following properties: FirstName, LastName, Address, City, State, and ZipCode. When I convert this to XML, it should look something like Figure 1.
Figure 1 Customer List Properties in XML
<Customers> <Customer FirstName="Jane" LastName="Dow"> <Address>123 Main St</Address> <City>Redmond</City> <State>WA</State> <ZipCode>10104</ZipCode> </Customer> <Customer FirstName="Matt" LastName="Berg"> <Address>456 First St</Address> <City>Seattle</City> <State>WA</State> <ZipCode>10028</ZipCode> </Customer> </Customers>
Using the DOM and Visual Studio 2005, I could write code that loops through my list and constructs the appropriate XML nodes, as shown in Figure 2. As you can see, that's a lot of code to write for something that should be pretty simple. It's also very hard to visualize the structure of the generated XML since it doesn't really match the structure of the code.
Figure 2 Using the DOM to Construct the XML Nodes
'With Visual Basic 9, you no longer have to write code like this. Public Function ConvertToXML(ByVal custList As List(Of Customer)) _ As String Dim doc As New XmlDocument Dim root As XmlElement = doc.CreateElement("Customers") For Each cust As Customer In custList Dim custElement As XmlElement = doc.CreateElement("Customer") custElement.SetAttribute("FirstName", cust.FirstName) custElement.SetAttribute("LastName", cust.LastName) Dim address As XmlElement = MakeElement(doc, "Address", cust.Address) Dim city As XmlElement = MakeElement(doc, "City", cust.City) Dim state As XmlElement = MakeElement(doc, "State", cust.State) Dim zipcode As XmlElement = MakeElement(doc, "ZipCode", cust.Zip) With custElement .AppendChild(address) .AppendChild(city) .AppendChild(state) .AppendChild(zipcode) End With root.AppendChild(custElement) Next doc.AppendChild(root) Return doc.InnerXml End Function Private Function MakeElement(ByVal doc As XmlDocument, _ ByVal elementName As String, _ ByVal value As String) As XmlElement Dim element As XmlElement = doc.CreateElement(elementName) element.InnerText = value Return element End Function
In the long term, maintaining this code will pose a huge inconvenience. If business requirements change and I find that I need to add some properties or attributes, I would have to wade through this code and figure out which nodes to update.
The LINQ to XML API
When coding against the DOM, I'm basically trying to construct the XML tree from the inside out. It is like I have to fight against the API to get my work done. Rather than having to construct a whole bunch of nodes and connect them together in the DOM, it'd be easier if I could just express the structure of the XML and then substitute the values in-line. This is exactly what the LINQ to XML API allows me to do—I can create the XML from the top down (otherwise known as functional construction).
Using LINQ, I can now write a much shorter and cleaner version of the function in Figure 2. The new, simpler solution is presented in Figure 3. This produces exactly the same output as the first solution, but, as you can see, the new snippet is shorter and much easier to read. I use the constructors for the XElement and XAttribute objects to construct the tree. And note this cool feature: these constructors have overloads that allow me to pass in additional LINQ to XML objects; for instance, I can pass an XElement into another XElement.
Figure 3 New Solution Using LINQ
Public Function ConvertToXML(ByVal custList As List(Of Customer)) As _ String Dim doc = New XElement("Customers", _ From cust In custList _ Select New XElement("Customer", _ New XAttribute("FirstName", cust.FirstName), _ New XAttribute("LastName", cust.LastName), _ New XElement("Address", cust.Address), _ New XElement("City", cust.City), _ New XElement("State", cust.State), _ New XElement("ZipCode", cust.Zip))) Return doc.ToString End Function
The first line creates a new XElement and names it Customers; it also creates the related <Customer> and </Customer> tags. The second parameter passed into the constructor is a LINQ query that will become the contents inside these Customer tags. This query loops through all the customers in my list and transforms the results to XML. Everything after the Select statement is evaluated and at run time produces a collection of Customer elements.
Since I'm querying data, I can take advantage of LINQ to do such things as filtering, sorting, grouping, joining, and a variety of other operations. For example, I could modify the function to only return customers who live in Seattle by adding a Where clause to the query. If I were using the DOM to do this, I'd have to add an If statement to the For loop. Sorting the results in LINQ is now as easy as adding an Order By clause to my query. With the DOM, I would have to manually sort the results before constructing the XML.
The example I'm using is fairly simple, as I am really just scratching the surface of what can be done with LINQ. So far, I've used the XElement and XAttribute types. The API defines many other types, including XDocument, XNamespace, and XComment.
XML Literals and Embedded Expressions
While LINQ to XML provides a much simpler experience than you've had in the past, the approach still isn't quite as easy as it could be. This is where XML literals, a new concept introduced in Visual Basic 9.0, enters the picture.
To understand XML literals, you must first understand what a literal is. Consider the following variables:
Dim someString = "A string literal" Dim someNumber = 256 Dim someDate = #2/27/2008# Dim someBoolean = True
Each of the variables shown here is directly assigned a value, and in each case the actual value is called a literal. (Note that with the new Type Inference feature in Visual Basic 9.0, these declarations are still strongly typed, even though I haven't provided an "As <Type>" clause for each variable.)
Anything inside quotes is referred to as a string literal. And the value assigned to someDate, for example, would be a date literal. An XML literal is exactly the same concept. I can express XML exactly as I would express the data anywhere else (in its literal representation). That means no quotes around it; XML literals are not Strings. I can directly assign data to the customers variable, as shown in Figure 4, and the compiler will see that the value is literal XML. Thus, the variable's type will be inferred to be XElement.
Figure 4 Directly Assign Data to the Customers Variable
Dim customers = <Customers> <Customer FirstName="Jane" LastName="Dow"> <Address>123 Main St</Address> <City>Redmond</City> <State>WA</State> <Zip>10104</Zip> </Customer> <Customer FirstName="Matt" LastName="Berg"> <Address>456 First St</Address> <City>Seattle</City> <State>WA</State> <Zip>10028</Zip> </Customer> </Customers>
When the compiler sees this XML, it will convert it into a series of calls to the LINQ to XML API. Not only is this easier to read, but the compiler actually optimizes this construction and produces faster code than I would get just using the API directly.
So far, all I've done is paste some XML into my project and store it in a variable—this isn't necessarily that useful. One common misconception about using XML support in Visual Basic 9.0 is that you should use it to store XML documents inside your code. This, however, is far from the truth. The point is to provide an easy syntax to create, query, and transform XML documents. I demonstrated how XML literals allow me to easily create documents. But in order for this to be useful, I need a way to insert values directly into the XML. This is where embedded expressions come into play.
An embedded expression essentially tells the compiler to "stop processing this XML literal for a second, evaluate the expression, and insert the result back into the XML." In order to do this, I use the <%= (expression) %> syntax. Figure 5 takes my ConvertToXml function and demonstrates how I can improve it using XML literals and embedded expressions.
Figure 5 Using XML Literals and Embedded Expressions
Public Function ConvertToXML(ByVal custList As List(Of Customer)) _ As String Dim doc = <Customers> <%= From cust In custList _ Select <Customer FirstName=<%= cust.FirstName %> LastName=<%= cust.LastName %>> <Address><%= cust.Address %></Address> <City><%= cust.City %></City> <State><%= cust.State %></State> <ZipCode><%= cust.Zip %></ZipCode> </Customer> %> </Customers> Return doc.ToString End Function
This example uses a LINQ query to generate XML, but the data source that I am querying over is actually an in-memory collection of type List(Of T). This is where you can really see the beauty of LINQ—I can just as easily use the same query operators that I used for collections against databases and other XML documents without having to learn a new API for each. And you can see how I could take data from a SQL Server® database and easily transform it to XML using just a few lines of code.
Applications often need to examine XML (perhaps provided through a file or an RSS feed) and make decisions based on the data. This is frequently done through XSL transformations, but LINQ to XML makes this unnecessary.
In the example shown in Figure 6, I load in an XML document, filter by customers who live in Seattle, and then select the first and last name. After that, I loop through my query and display all the matching results on screen.
Figure 6 Sample Query Using XML Axis Properties
Public Sub DisplaySeattleCustomers() Dim filePath = My.Application.Info.DirectoryPath & "\customers.xml" Dim doc = XDocument.Load(filePath) Dim query = From cust In doc...<Customer> _ Where cust.<City>.Value = "Seattle" _ Select Name = cust.@FirstName & " " & cust.@LastName For Each name In query MsgBox(name) Next End Sub
Figure 6 Sample Query Using XML Axis Properties
At first glance, you might think something seems wrong here. If the compiler infers doc to be of type XDocument (and cust to be of type XElement), how am I able to type things like cust.<City> and cust.@FirstName? XElement obviously doesn't have these properties, as they're specific to the XML that I loaded. What's actually happening here is the Visual Basic compiler has saved me a fair amount of work through a feature called XML properties.
XML properties, sometimes referred to as XML axis properties, provide an easy way to retrieve values stored in XML. In the query shown in Figure 6, I actually use all three of the available axes: descendants, elements, and attributes.
In XML, a descendant is an element that is nested one or more levels below the current element. The expression "doc...<Customer>" uses the descendants axis, and it basically says "find all descendants named Customer." Remember that the XML I have been using throughout has a root node called Customers, and it contains Customer elements. The triple-dot syntax is essentially translated to doc.Descendants("Customer").
Now I am going to take a look at the cust.<City>.Value expression. In this case, I am using the elements axis, so the line is translated into cust.Elements("City").Value. You may be wondering why we need to explicitly call .Value before doing the comparison here (which we don't need to do for attributes). The reason is that elements such as City can contain other elements, so the expression cust.<City> actually returns an IEnumerable(Of XElement). The .Value extension property concatenates all values stored in that element, which in this case is just a simple text value.
The last expression, cust.@FirstName, uses the attributes axis. This is translated to cust.Attributes("FirstName") and returns the value stored in the attribute. Since attributes can't contain elements, there's no need to call .Value. I then use the Select method and combine the two fields into one, called Name, in order to make things simpler for display purposes.
XML Schema IntelliSense
It would have been great if IntelliSense® would have told me the name of the fields in my XML document while I was coding up that last example. The problem, however, is how can the compiler know the structure of the XML document? The answer is through XSD schemas. If you have a schema for your XML, you can use the Imports statement in Visual Basic to bring your schema into scope. Once you do that, you get real IntelliSense based on your schema.
The XML to Schema Tool for Visual Basic 2008 (which you can download at go.microsoft.com/fwlink/?LinkId=102501) allows you to quickly create XSD schemas based on existing XML files. You just point it at an XML file or Web URL, and it then does the grunt work of generating an XSD file.
Visual Basic is all about productivity. And with business applications increasingly depending on XML for things like storage, data transfer, and even user interfaces, it's important that Visual Basic incorporate features that make it easier for developers to work with XML. Visual Basic 9.0 steps up in a big way, offering rich XML support and powerful new query capabilities.
The simple yet powerful query syntax offered by LINQ enables you to write queries against XML just as easily as you would against a database or an in-memory collection. And once you start to play with Visual Basic 9.0, you will discover many new XML features that I haven't had a chance to touch upon here, including built-in support for XML namespaces, comments, and fragments.
Send your questions and comments to email@example.com.
Jonathan Aneja is a Program Manager on the Visual Basic team at Microsoft. He works mainly on compiler features, such as LINQ, as well as on other projects, such as the Interop Forms Toolkit. Jonathan has been at Microsoft for about two years, where he started out in ISV Advisory Services.