Data Points

XML Features in SQL Server 2000

John Papa

Code download available at:DataPoints0506.exe(152 KB)


Returning XML
Transforming and Hierarchies
Returning XML via ADO.NET
Working with FOR XML Restrictions
Inserts via OPENXML
Inserts and Updates
Bulk Inserts via OPENXML
Using the Sample
Wrapping Up

SQL Server™ 2000 includes several XML features that let you transform relational rowsets into hierarchical XML documents, read XML documents, and bulk load data via XML. For example, you can pass an XML document to a stored procedure, join the XML to some tables and return a rowset, or even modify data in the database. The inclusion of the OPENXML function and the FOR XML statement have been prompted by the ever-expanding role of XML in today's enterprise systems. Some of these features not only support XML, but also offer improved performance when loading data in bulk.

In this month's column, I will discuss how you can return XML from SQL Server via T-SQL's FOR XML clause. I'll walk through examples that show the various ways you can return XML data and schema information, and show how to transform the XML to a more palatable format. Then I will discuss OPENXML, how to join an XML document to database tables, and how to pull XML out of a DataSet using the WriteXml and GetXml methods. The SQL for the examples as well as a sample ASP.NET project that executes some of the examples and streams them to text files are all in the download available from the MSDN®Magazine Web site. The sample project also includes code to insert and update records to the database from XML.

Returning XML

When used in a SELECT statement, the FOR XML clause tells SQL Server to return the data as XML as opposed to a standard rowset. You can specify the mode: RAW, AUTO, or EXPLICIT. Each offers a different way to transform the XML (Figure 1 shows a quick overview of each mode).

Figure 1 Overview of the FOR XML Modes

Mode Description
RAW Each record in the rowset is transformed to an XML element named row. The <row> elements will contain an attribute that represent each column that is retrieved.
AUTO Records in the rowset can be transformed into nested XML elements named after the tables in the FROM clause. Each column that is retrieved will be represented as an attribute.
EXPLICIT You have a lot of control over how the XML is formatted. However, the syntax to use the EXPLICIT mode is much more involved. XSLT is a much more common and accepted practice of transforming XML.

For example, if you query the Northwind database's Employees table using FOR XML RAW, it would return each employee row within a <row> element. Each column that you included in the SELECT statement would be represented as an attribute on the <row> element. The following FOR XML RAW query selects two employee records and then returns them in the RAW format:

SELECT EmployeeID, FirstName, LastName FROM Employees WHERE LastName LIKE 'D%' FOR XML RAW <row EmployeeID="1" FirstName="Nancy" LastName="Davolio"/> <row EmployeeID="9" FirstName="Anne" LastName="Dodsworth"/>

The same SELECT statement could be modified to use the FOR XML AUTO clause instead. This time the elements are named Employees, matching the name of the source table. The columns are still attributes of the main elements:

SELECT EmployeeID, FirstName, LastName FROM Employees WHERE LastName LIKE 'D%' FOR XML AUTO <Employees EmployeeID="1" FirstName="Nancy" LastName="Davolio"/> <Employees EmployeeID="9" FirstName="Anne" LastName="Dodsworth"/>

Transforming and Hierarchies

While the differences in the previous example are slight, the differences between AUTO and RAW are more pronounced when used with a query that joins tables. A query that uses FOR XML RAW will only return <row> elements whether the data comes from one or more tables. Therefore, the RAW mode does not take advantage of the inherent hierarchical structure of XML documents. Imagine the following SQL statement:

SELECT Customers.CustomerID, CompanyName, OrderID, CONVERT(VARCHAR(10), OrderDate, 101) AS OrderDate FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID ORDER BY Customers.CustomerID

In this example I want to retrieve a one-to-many parent-child relationship. If I execute this SQL statement, I will get back a series of customers and their corresponding orders. If I throw on the FOR XML RAW clause and execute it once again, the resulting XML will contain a single <row> element representing each row that is returned. For example, the XML data in Figure 2 represents the rows that the FOR XML RAW will return where the CustomerID is ALFKI.

Figure 2 The XML Data

<row CustomerID="ALFKI" CompanyName="Alfreds Futterkiste" OrderID="10643" OrderDate="08/25/1997" /> <row CustomerID="ALFKI" CompanyName="Alfreds Futterkiste" OrderID="10692" OrderDate="10/03/1997" /> <row CustomerID="ALFKI" CompanyName="Alfreds Futterkiste" OrderID="10702" OrderDate="10/13/1997" /> <row CustomerID="ALFKI" CompanyName="Alfreds Futterkiste" OrderID="10835" OrderDate="01/15/1998" /> <row CustomerID="ALFKI" CompanyName="Alfreds Futterkiste" OrderID="10952" OrderDate="03/16/1998" /> <row CustomerID="ALFKI" CompanyName="Alfreds Futterkiste" OrderID="11011" OrderDate="04/09/1998" />

Notice that the data is not laid out in a parent-child hierarchy. If I wanted the data to appear as a series of <Customers> elements that contain a series of their associated <Orders> elements, I could use the FOR XML AUTO clause instead (the nesting of the parents and their children depends on the parent rows being grouped together). The resulting XML for the CustomerID of ALFKI would look like the following:

<Customers CustomerID="ALFKI" CompanyName="Alfreds Futterkiste"> <Orders OrderID="10643" OrderDate="08/25/1997" /> <Orders OrderID="10692" OrderDate="10/03/1997" /> <Orders OrderID="10702" OrderDate="10/13/1997" /> <Orders OrderID="10835" OrderDate="01/15/1998" /> <Orders OrderID="10952" OrderDate="03/16/1998" /> <Orders OrderID="11011" OrderDate="04/09/1998" /> </Customers>

This XML is much more readable since it uses the names of the tables as the element names. It also contains less data since it does not repeat the CustomerID and CompanyName attributes for every Orders element, as does the FOR XML RAW example.

If you prefer the values of the columns to be presented as elements instead of attributes, then you are in luck. By specifying the ELEMENTS option in the FOR XML clause, all column values will become elements within the XML. Sometimes seeing is believing, so I have shown the output of the previous query when the ELEMENTS condition is applied in Figure 3. (For brevity, the XML samples I show include only the XML for the CustomerID ALFKI. The actual XML generated by these queries would contain the XML for all customers and their orders.) The query that uses the ELEMENTS option looks like this:

\SELECT Customers.CustomerID, CompanyName, OrderID, CONVERT(VARCHAR(10), OrderDate, 101) AS OrderDate FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID ORDER BY Customers.CustomerID FOR XML AUTO, ELEMENTS


<Customers> <CustomerID>ALFKI</CustomerID> <CompanyName>Alfreds Futterkiste</CompanyName> <Orders> <OrderID>10643</OrderID> <OrderDate>08/25/1997</OrderDate> </Orders> <Orders> <OrderID>10692</OrderID> <OrderDate>10/03/1997</OrderDate> </Orders> <Orders> <OrderID>10702</OrderID> <OrderDate>10/13/1997</OrderDate> </Orders> <Orders> <OrderID>10835</OrderID> <OrderDate>01/15/1998</OrderDate> </Orders> <Orders> <OrderID>10952</OrderID> <OrderDate>03/16/1998</OrderDate> </Orders> <Orders> <OrderID>11011</OrderID> <OrderDate>04/09/1998</OrderDate> </Orders> </Customers>

Returning XML via ADO.NET

I have included with the downloadable sample code an ASP.NET project that allows you to run some of the code samples (shown in Figure 4). The sample for this project uses a method called GetFORXML that runs any SQL with a FOR XML clause and writes the resulting XML to a file. The XML is also wrapped within a <root> element for simplicity.

Figure 4 Using ADO.NET to Get XML from SQL Server

Figure 4** Using ADO.NET to Get XML from SQL Server **

The code for the GetFORXML method, as shown in Figure 5, demonstrates how you can use the standard SqlCommand object to execute the SQL statement containing the FOR XML clause. I used the ExecuteXmlReader method so I could return the XML as an XmlReader object. The DataSet's ReadXml method accepts the XmlReader, with the second argument indicating that the XML is a fragment. Because it is a fragment, it will only load all of the Customer nodes if it has a schema to match. So prior to loading the XML, I load the schema (inferred in this case) using the ReadXmlSchema method. Finally, I set the DataSetName property of the DataSet to "root", which will result in the read XML being enclosed within a parent tag <root/>.

Figure 5 Traversing XML with the XmlTextReader

private string GetFORXML(string sFile, string sSQL) { // Create and open the connection to Northwind using(SqlConnection oCn = new SqlConnection(this.sCnNW)) { oCn.Open(); // Create the SQL command to execute SqlCommand oCmd = new SqlCommand(sSQL, oCn); DataSet oDs = new DataSet(); // Execute the SQL statement and return the data to an XmlReader. // Then read the schema and the fragment XmlReader oXml = oCmd.ExecuteXmlReader(); oDs.ReadXmlSchema(oXml); oDs.ReadXml(oXml, XmlReadMode.Fragment); oCn.Close(); oDs.DataSetName = "root"; oDs.WriteXml(this.sPath + sFile); return "Wrote XML to file " + sFile; } }

The ASP.NET page in Figure 4 can execute and output the XML for the examples I have reviewed thus far and can also get employee data using FOR XML RAW, BINARY BASE64. Plus, it can execute and output XML to get customer and order data using:


Working with FOR XML Restrictions

Neither FOR XML AUTO nor FOR XML RAW offer complete support for returning binary data. For example, if you try to select a binary field (such as the Employees.Photo column) and use the FOR XML RAW, you will receive a nasty error message. One way around this is to return a URL to the binary data field. This requires you to have a SQL Server virtual directory set up in IIS first. Another option, which does not require an IIS configuration, is to specify the BINARY BASE64 option in the FOR XML clause, as shown in the following code:

SELECT EmployeeID, FirstName, LastName, Photo FROM Employees WHERE LastName LIKE 'D%' FOR XML RAW, BINARY BASE64

This tells SQL Server to format the binary data using the BASE64 format. The good news is that your SQL statement will no longer blow up. The bad news is that reading the results of a query that uses BINARY BASE64 is not pretty. The following XML is an example of what is returned when using BINARY BASE64. I cropped the contents of the Photo attribute because it was over 14,000 characters long!

<row EmployeeID="1" FirstName="Nancy" LastName="Davolio" Photo="FRw...f4="/>

Another restriction on the FOR XML AUTO mode is that it does not support the GROUP BY clause or aggregate functions. However, you can get around this by selecting the rows to form a TABLE variable and then using the FOR XML AUTO clause to retrieve the rows from that intermediary table. The following SQL code accomplishes this:

DECLARE @TempTable TABLE (OrderID INT, Total MONEY) INSERT INTO @TempTable SELECT OrderID, SUM(UnitPrice * Quantity) AS Total FROM [Order Details] GROUP BY OrderID SELECT OrderID, Total FROM @TempTable AS OrderDetails FOR XML AUTO

Computed columns also must be accounted for when using FOR XML. While computed columns are supported by FOR XML, you need to always make sure you name the columns. The problem arises when FOR XML tries to create an attribute for each column. The name of the attribute is created by using the name of the column that it is associated with. So if the column is a computed column and has not been aliased, then SQL Server will balk at you. The easy solution is to make sure you alias your computed column, like in the following SQL statement and its resulting XML:

SELECT TOP 1 LastName + ', ' + FirstName AS FullName FROM Employees FOR XML AUTO <Employees FullName="Davolio, Nancy"/>

One last item worth noting when using FOR XML is that any of the special XML characters will be converted using XML encoding. Just like HTML converts special characters in a URL, properly formed XML encodes special characters. For example, if the < character is contained within the data, then it will be converted to "&lt;".


So far I have reviewed how to get XML from SQL Server. The next logical step is to take XML and use it to modify data in the database. Enter the OPENXML function. The T-SQL OPENXML function can build a relational rowset from an XML stream. This rowset can act like a table so it can be used in other SQL statements perhaps joined with other tables or even used to insert or update data. This can be very handy because an application can be passed an XML stream containing new or updated records that need to be modified in the database. The OPENXML function uses a form of XPath so that you can tell it where to look for the data that you want to pull out.

To begin, I'll take an XML document containing a customer's data and insert it into the Customers table from the Northwind database. This XML document could use elements or attributes to represent the customer data, like so:

<root> <customer> <custid>77777</custid> <custname>fake customer</custname> <city>Somewhere</city> <country>USA</country> </customer> </root>

The OPENXML statement allows you to drill into the XML document using XPath expressions, which means the format of the XML can be quite flexible.

I created a stored procedure (shown in Figure 6) that accepts this XML document, prepares it for a relational rowset, reads the customer data, inserts it into the Customers table, and then removes the XML document from memory. Now, let's see how this all works. First, you can pass in the XML as a variable of any string type (such as VARCHAR(8000) or even a large object type like TEXT or NTEXT). I used NTEXT so the XML did not have the 8,000 character limitation of VARCHAR. NTEXT has a maximum length of 230 - 1 (1,073,741,823) characters. (Of course, in SQL Server 2005 you could use the XML datatype instead.)

Figure 6 Inserting a Customer via OPENXML

CREATE PROCEDURE prInsertCustomerFromXML ( @sXML NTEXT ) AS DECLARE @iDoc INT EXEC sp_xml_preparedocument @iDoc OUTPUT, @sXML INSERT INTO Customers (CustomerID, CompanyName, City, Country) SELECT CustomerID, CompanyName, City, Country FROM OPENXML(@iDoc,'/root/customer',2) WITH (CustomerID NCHAR(5) 'custid', CompanyName NVARCHAR(40) 'custname', City NVARCHAR(15) 'city', Country NVARCHAR(15) 'country') EXEC sp_xml_removedocument @iDoc RETURN

The first thing I do to the XML is to pass it to the sp_xml_preparedocument system stored procedure. This procedure takes the XML and translates it into an internal DOM so that OPENXML can process it as a rowset. It also creates a reference to the DOM in memory (the @iDoc variable in Figure 6). The XML can be accessed as a rowset within a SELECT statement's FROM clause by passing the @iDoc variable to the OPENXML function.

Once I am finished using the XML for this procedure, I should remove it from memory using the sp_xml_removedocument system stored procedure.

At the core of the code in Figure 6 is the OPENXML function, which operates on the in-memory representation of the XML document. The OPENXML method accepts the reference to the XML document as its first argument. Its second argument is used to tell OPENXML which nodes in the XML DOM you want to map to rows. In this example, I want to identify the Customers nodes in order to grab the customer's data values, so I specify "/root/customer" as the second argument. The third argument to the OPENXML function is used to indicate what type of mapping to use. A value of 1 tells OPENXML to map to attributes and a value of 2 tells it to map to elements.

The WITH clause can be used to specify the fields to grab out of the XML document and the datatypes to convert them to. It can also be used to map an attribute or element in the XML using an XPath expression or even alias the XML field to be used in a query. The OPENXML code in Figure 6 basically takes the four customer fields from the XML data and turns them into a rowset. That rowset is then selected from and inserted into the Customers table.

Inserts via OPENXML

Now I will take the following XML sample and insert an order and two order detail rows from it:

<Customer CustomerID='ALFKI'> <Order OrderDate='1/1/1972' Freight='3'> <Detail ProductID='1' Price='4' Quantity='10' Discount='0'/> <Detail ProductID='2' Price='5' Quantity='2' Discount='0'/> </Order> </Customer>

First, I pass this XML to the stored procedure shown in Figure 7 and prepare it using the sp_xml_preparedocument system stored procedure. Then, since I want to insert an order and its children records, I begin a transaction to wrap the INSERT statements. This allows me to roll back the transaction if part of it fails. Next, I open the XML document using OPENXML starting at the Customer\Order nodes. Using the WITH clause, I grab the CustomerID by traveling back up the XML document to the parent node of the Order element (the Customer element) and getting the value by looking at the CustomerID attribute of the Customer element. This is a great feature of the OPENXML function as it allows you to travel up and down the XML document using limited XPath expressions to grab values of attributes and elements.

Figure 7 Inserting a Parent and Its Children

CREATE PROCEDURE prInsertOrderAndOrderDetailsFromXML @sXML NTEXT AS DECLARE @iDoc int, @OrderID int EXEC sp_xml_preparedocument @iDoc output, @sXML IF @@Error<>0 BEGIN RETURN END BEGIN TRANSACTION INSERT INTO Orders (CustomerID , OrderDate, Freight) SELECT CustomerID, CAST(OrderDate As datetime), CAST(Freight AS money) FROM OpenXML(@iDoc,'/Customer/Order', 1) WITH (CustomerID nchar(5) '../@CustomerID' , OrderDate varchar(10) '@OrderDate', Freight varchar(12) '@Freight') IF @@Error<>0 BEGIN EXEC sp_xml_removedocument @iDoc ROLLBACK TRANSACTION RETURN END SET @OrderID=SCOPE_IDENTITY() INSERT INTO [Order Details] (OrderID, ProductID, UnitPrice, Quantity, Discount) SELECT @OrderID, ProductID, ProductPrice, ProductQTY, Discount FROM OpenXML(@iDoc,'/Customer/Order/Detail') WITH (ProductID int '@ProductID', ProductPrice money '@Price', ProductQTY smallint '@Quantity', Discount real '@Discount') IF @@Error<>0 BEGIN EXEC sp_xml_removedocument @iDoc ROLLBACK TRANSACTION RETURN END EXEC sp_xml_removedocument @iDoc COMMIT TRANSACTION SELECT @OrderID GO

Once the Order has been inserted, I grab the OrderID value that was just generated by the built-in SQL Server SCOPE_IDENTITY function. Then I move on to insert the Order Details rows using yet another INSERT statement that uses the OPENXML function to grab the Order Details from the XML data. As long as no errors are encountered, the order and its children order detail rows are inserted into their respective database tables.

The SCOPE_IDENTITY approach works with a single Order and its children. However, when inserting multiple Orders and their children all in one XML batch, it gets more involved. The issue is that with multiple Order records, you still want to be able to map the appropriate order to its children. Since you won't know which rows to associate, you would have to add some code to handle this. One way to tackle this is to use the @mp:id/@mp:parentid meta properties in the WITH clause to provide a way to get the parent's new OrderID and map it to its children's OrderID fields.

Inserts and Updates

XML documents that are prepared by the SQL Server sp_xml_preparedocument system stored procedure can be used in a JOIN just like any other table. They can also be used to INSERT, UPDATE, or DELETE records, just like any other rowset. To demonstrate this, I first selected a series of Order Details rows from OrderID 10285 and stuffed them into an ADO.NET DataSet. Then, I modified some of the existing Order Details rows by changing their quantities and added several Order Details rows to the DataSet, as shown in the following code snippet:

DataSet oDs = new DataSet("NorthwindOrderDetailsData"); oDa.Fill(oDs, "OrderDetails"); //-- Modify the existing 2 line items (for updates) oDs.Tables["OrderDetails"].Rows[0]["Quantity"] = "2"; oDs.Tables["OrderDetails"].Rows[1]["Quantity"] = "7"; // Add a line item for several other products (inserts) for (int i = 10; i <= 30; i++) { oDs.Tables["OrderDetails"].Rows.Add( new object[]{iOrderID, i, 1.50, 3, 0}); }

Once I make these changes, the DataSet gives me several options on how to take the changed data within the DataSet and turn it into XML. I can use the WriteXml method to write out the data as a DiffGram, the data with its schema, or the data without its schema. I can also use the GetXml method to get the data into XML. For this example I will use the DiffGram since it will contain the modified rows along with the before and after state of each row (for UPDATES).

The DiffGram is passed to a stored procedure (shown in Figure 8) which picks out the new and updated rows from the XML document and inserts them into a TABLE variable (@tblTemp). Notice that the OPENXML function in Figure 8 uses an XPath expression to grab the value of the hasChanges attribute. This will be an i for the inserted rows and an m for the modified rows. Because the OPENXML function can filter out rows, only a single XML document must be passed into the stored procedure. Next, the appropriate rows are inserted into the Order Details table and then the appropriate Order Details rows are updated. Both the INSERT and the UPDATE are wrapped inside of a transaction so it can be rolled back if either fails.

Figure 8 Inserting and Updating Using a DiffGram

CREATE PROCEDURE prInsertUpdateOrderDetailsFromXML @sXML NTEXT AS DECLARE @iDoc int, @OrderID int EXEC sp_xml_preparedocument @iDoc output, @sXML, '<ns xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"/>' IF @@Error<>0 BEGIN RETURN END BEGIN TRANSACTION DECLARE @tblTemp TABLE ( OrderID INT, ProductID INT, UnitPrice MONEY, Quantity SMALLINT, Discount REAL, ChangeType CHAR(1) ) INSERT INTO @tblTemp (OrderID, ProductID, UnitPrice, Quantity, Discount, ChangeType) SELECT OrderID, ProductID, ProductPrice, ProductQTY, Discount, ChangeType FROM OpenXML(@iDoc, '/diffgr:diffgram/NorthwindOrderDetailsData/OrderDetails', 2) WITH (OrderID INT 'OrderID', ProductID INT 'ProductID', ProductPrice MONEY 'UnitPrice', ProductQTY SMALLINT 'Quantity', Discount REAL 'Discount', ChangeType CHAR(1) '@diffgr:hasChanges') IF @@Error<>0 BEGIN EXEC sp_xml_removedocument @iDoc ROLLBACK TRANSACTION RETURN END EXEC sp_xml_removedocument @iDoc INSERT INTO [Order Details] (OrderID, ProductID, UnitPrice, Quantity, Discount) SELECT OrderID, ProductID, UnitPrice, Quantity, Discount FROM @tblTemp WHERE ChangeType = 'i' IF @@Error<>0 BEGIN ROLLBACK TRANSACTION RETURN END UPDATE [Order Details] SET UnitPrice = t.UnitPrice, Quantity = t.Quantity, Discount = t.Discount FROM @tblTemp t, [Order Details] od WHERE t.ChangeType = 'm' AND od.OrderID = t.OrderID AND od.ProductID = t.ProductID IF @@Error<>0 BEGIN ROLLBACK TRANSACTION RETURN END COMMIT TRANSACTION GO

Bulk Inserts via OPENXML

Inserting 10 rows from a DataSet via a DataAdapter and a stored procedure is easy enough. But the DataAdapter traverses the DataSet's rows one at time looking for the rows whose rowstate is inserted, and it executes the stored procedure associated with the DataAdapter's InsertCommand for each row. This means that 10 inserts would cause 10 calls to the database. This back and forth between the ADO.NET code and the database barely makes a noticeable difference when there are only a few necessary updates. When other factors are introduced, however, such as a large increase in the number of concurrent users or the insertion of hundreds of rows, performance could degrade rapidly. Instead of calling a stored procedure 100 times to insert 100 rows you could pass the 100 rows as XML into a stored procedure all in one shot.

The sample ASP.NET application included with the downloadable code will execute a stored procedure that will insert n number of customers into the Customers table. The ASP.NET code creates and adds 100 customer records in a DataSet. Then it uses the WriteXml method to output the XML data without its schema. This XML is then passed to a stored procedure that inserts the 100 customers all in a single shot. This technique of inserting bulk data is more efficient than inserting each customer one at a time by executing a stored procedure for each insert.

Using the Sample

If you use SQL Query Analyzer to try out any of the FOR XML SQL included in this column you might want to change some of the default options. For example, if you want to query and return XML, you'll want to start by increasing the maximum characters per column in the results pane. Go to Options | Tools | Results and increase the Maximum Characters per Column setting to something like 4096. Since the XML that is returned appears to be a single column in the results pane, this will not cut off the XML at the default of 256 characters. In the end, you will be able to see your XML and copy and paste it into an XML editor if you prefer.

Wrapping Up

The first step towards XML and SQL Server integration was the introduction of the FOR XML and OPENXML features. Thanks to that advance, you can get XML out of a SQL Server database without having to transform it using some intermediary component. You can also pass XML back into a database by combining the use of the OPENXML function and the old standbys, the SQL INSERT, UPDATE, and DELETE statements. These XML features are only the first steps in the gathering support for XML that SQL Server 2005 will offer, but they are valid tools that can be used to begin integrating your XML applications with SQL Server today.

Send your questions and comments for John to

John Papa is a .NET architect and a baseball fanatic who spends most of his summer nights rooting for the Yankees with his family and his faithful dog, Kadi. He has authored articles and several books on ADO, XML, and SQL Server and can often be found speaking at industry conferences such as VSLive or blogging at