Share via


The XML Files

OPENXML, XSLT Keys, Select versus Match, XPath, and More

Aaron Skonnard

Q Is it possible to write a SQL Server™ stored procedure that takes an XML document as input and inserts the information into tables?

Q Is it possible to write a SQL Server™ stored procedure that takes an XML document as input and inserts the information into tables?

A The process of pulling an XML document apart and inserting the data into tables is often referred to as shredding. Transact-SQL (T-SQL) makes XML shredding possible through the OPENXML statement. OPENXML creates rowset views on top of in-memory XML documents. As a rowset provider, OPENXML rowsets can be used in T-SQL statements wherever traditional rowsets may be called for. In other words, OPENXML can supply records to standard INSERT, UPDATE, or DELETE statements.

A The process of pulling an XML document apart and inserting the data into tables is often referred to as shredding. Transact-SQL (T-SQL) makes XML shredding possible through the OPENXML statement. OPENXML creates rowset views on top of in-memory XML documents. As a rowset provider, OPENXML rowsets can be used in T-SQL statements wherever traditional rowsets may be called for. In other words, OPENXML can supply records to standard INSERT, UPDATE, or DELETE statements.

Before you can begin using OPENXML, you must have SQL Server load the XML document into memory. SQL Server provides the sp_xml_preparedocument stored procedure for doing this. It loads the supplied XML text into an internal DOM (using MSXML2.dll) and returns a handle that you can later supply to the OPENXML statement.

The handle returned by sp_xml_preparedocument is valid for the duration of the SQL Server connection or until you decide to remove it from memory by calling sp_xml_removedocument. SQL Server stores the loaded documents in its internal cache, but only allows MSXML to use one-eighth of the total memory available. Hence, it's important to call sp_xml_removedocument when you're finished working with the XML documents. The following stored procedure illustrates this process:

CREATE PROCEDURE InsertInvoice @xmlDoc text AS DECLARE @iDoc int -- load XML document into memory EXEC sp_xml_preparedocument @iDoc OUTPUT, @xmlDoc -- process XML document with OPENXML here -- remove XML document from memory EXEC sp_xml_removedocument @iDoc GO

Once you have the handle to the loaded XML document, you can use OPENXML to define a rowset view of the data. This requires you to provide XPath expressions and rowset schema information that together specify the exact mapping details. For example, this XML document contains invoice information:

<Invoice ID='1000' CustomerName='Acme Computers'> <LineItems> <LineItem Sku='134' Description='Mouse' Price='9.95' Quantity='35'/> <LineItem Sku='153' Description='Keyboard' Price='19.95' Quantity='20'/> <LineItem Sku='171' Description='USB Disk' Price='14.95' Quantity='50'/> </LineItems> </Invoice>

When using OPENXML, you must first identify which nodes in the XML document will map to rows in the new rowset. You specify this using an XPath expression that identifies an XML node-set. For example, if you want to map the LineItem elements to a rowset, the XPath would be /Invoice/LineItems/LineItem.

Then you provide the new rowset schema information in the WITH clause. OPENXML maps the columns specified in the WITH clause to the identified node's attributes by default (here, the Sku column maps to LineItem's Sku attribute, the Description column maps to LineItem's Description attribute, and so forth). For example, the stored procedure in Figure 1 automatically maps the LineItem attributes to a rowset using identical column names.

Figure 1 Stored Procedure

CREATE PROCEDURE InsertInvoice @xmlDoc text AS DECLARE @iDoc int EXEC sp_xml_preparedocument @iDoc OUTPUT, @xmlDoc SELECT * FROM OPENXML(@iDoc, '/Invoice/LineItems/LineItem') WITH ( Sku varchar(50), Description varchar(50), Price float, Quantity int ) EXEC sp_xml_removedocument @iDoc GO

Executing this stored procedure by supplying the XML document shown earlier produces the following rowset:

Sku Description Price Quantity
134 Mouse 9.95 35
153 Keyboard 19.95 20
171 USB Disk 14.95 50

You can also map the rowset to elements (instead of attributes) through a special flag. The third argument specifies the default mapping that should be used between the XML node-set and the relational rowset. The possible flag values and their descriptions are provided in Figure 2.

Figure 2 OPENXML Flags

Byte Value Meaning
0 Defaults to attribute-centric mapping.
1 Use the attribute-centric mapping. Can be combined with XML_ELEMENTS, in which case attribute-centric mapping is applied first, and then element-centric mapping is applied for all columns not yet dealt with.
2 Use the element-centric mapping. Can be combined with XML_ATTRIBUTES, in which case attribute-centric mapping is applied first, and then element-centric mapping is applied for all columns not yet dealt with.
8 Can be combined (logical OR) with XML_ATTRIBUTES or XML_ELEMENTS. In the context of retrieval, this flag indicates that the consumed data should not be copied to the overflow property @mp:xmltext.

For example, consider the following document that uses elements instead of attributes:

<Invoice> <InvoiceID>1000</InvoiceID> <CustomerName>Acme Computers</CustomerName> <LineItems> <LineItem> <Sku>134</Sku> <Description>Mouse</Description> <Price>9.95</Price> <Quantity>35</Quantity> </LineItem> ... </LineItems> </Invoice>

The following call to OPENXML produces the same rowset as before, but for the document using elements instead of attributes:

-- 2 indicates an element-centric mapping SELECT * FROM OPENXML(@iDoc, '/Invoice/LineItems/LineItem', 2) WITH (Sku varchar(50), Description varchar(50), Price float, Quantity int)

In some situations, the default mapping mechanisms aren't sufficient. For example, they don't let you include the InvoiceID and CustomerName elements in the rowset of LineItem information. Hence, you can also specify an XPath expression that defines the exact mapping for each column. Relative XPath expressions are evaluated relative to the node identified by the first XPath expression supplied to OPENXML. The following example illustrates how to generate a rowset including the CustomerName and InvoiceID elements:

SELECT * FROM OPENXML(@iDoc, '/Invoice/LineItems/LineItem', 2) WITH (CustomerName varchar(50) '../../CustomerName', InvoiceID int '../../InvoiceID', Sku varchar(50), Description varchar(50), Price float, Quantity int)

Executing this stored procedure against the element-centric document shown earlier produces the following rowset:

CustomerName InvoiceID Sku Description Price Quantity
Acme Computers 1000 134 Mouse 9.95 35
Acme Computers 1000 153 Keyboard 19.95 20
Acme Computers 1000 171 USB Disk 14.95 50

Finally, inserting these rows into a table with the same structure is as simple as supplying the selected rowset to an INSERT statement, as shown in Figure 3. There are a variety of ways to shred XML documents outside of the database, but OPENXML is the mechanism provided by T-SQL.

Figure 3 Insert into LineItems

CREATE PROCEDURE InsertInvoice @xmlDoc text AS DECLARE @iDoc int EXEC sp_xml_preparedocument @iDoc OUTPUT, @xmlDoc INSERT INTO LineItems SELECT * FROM OPENXML(@iDoc, '/Invoice/LineItems/LineItem', 2) WITH (Sku varchar(50), Description varchar(50), Price float, Quantity int, InvoiceID int '../../InvoiceID') EXEC sp_xml_removedocument @iDoc GO

Q Can you tell me how to process a namespace-qualified document using OPENXML?

Q Can you tell me how to process a namespace-qualified document using OPENXML?

A Although the examples I looked at in the previous question didn't contain any XML namespaces, OPENXML is also capable of shredding namespace-qualified documents. To enable this, you must first supply a set of namespace bindings to sp_xml_preparedocument. These bindings should contain the namespace prefixes that you intend to use in the OPENXML statement. Then you simply use those prefixes to identify namespace-qualified elements/attributes found in the XML document. For example, consider the following invoice document that contains two namespaces:

<Invoice xmlns="urn:invoices"> <InvoiceID>1000</InvoiceID> <CustomerName>Acme Computers</CustomerName> <LineItems xmlns="urn:items"> <LineItem> <Sku>134</Sku> <Description>Mouse</Description> <Price>9.95</Price> <Quantity>35</Quantity> </LineItem> ... </LineItems> </Invoice>

A Although the examples I looked at in the previous question didn't contain any XML namespaces, OPENXML is also capable of shredding namespace-qualified documents. To enable this, you must first supply a set of namespace bindings to sp_xml_preparedocument. These bindings should contain the namespace prefixes that you intend to use in the OPENXML statement. Then you simply use those prefixes to identify namespace-qualified elements/attributes found in the XML document. For example, consider the following invoice document that contains two namespaces:

<Invoice xmlns="urn:invoices"> <InvoiceID>1000</InvoiceID> <CustomerName>Acme Computers</CustomerName> <LineItems xmlns="urn:items"> <LineItem> <Sku>134</Sku> <Description>Mouse</Description> <Price>9.95</Price> <Quantity>35</Quantity> </LineItem> ... </LineItems> </Invoice>

To process this document properly now that it contains namespaces, you need to supply a string of XML containing namespace declarations as the third parameter to sp_xml_preparedocument, as shown here:

EXEC sp_xml_preparedocument @iDoc OUTPUT, @xmlDoc, '<ns xmlns:inv="urn:invoices" xmlns:li="urn:items"/>'

The string must contain a well-formed XML document, but the name of the root element doesn't matter (in this case, I called it ns). The namespace declarations on the element are the important part. With the prefixes defined (in this case, "inv" and "li"), you can start using them in the XPath expressions that constitute the OPENXML statement, as shown in Figure 4.

Figure 4 Namespace Declarations

CREATE PROCEDURE InsertInvoice @xmlDoc text AS DECLARE @iDoc int EXEC sp_xml_preparedocument @iDoc OUTPUT, @xmlDoc, '<ns xmlns:inv="urn:invoices" xmlns:li="urn:items"/>' INSERT INTO LineItems SELECT * FROM OPENXML(@iDoc, '/inv:Invoice/li:LineItems/li:LineItem', 2) WITH (Sku varchar(50) 'li:Sku', Description varchar(50) 'li:Description', Price float 'li:Price', Quantity int 'li:Quantity', InvoiceID int '../../inv:InvoiceID') EXEC sp_xml_removedocument @iDoc GO

Once you start working with namespace-qualified documents, the default element/attribute mappings will no longer work like they did in the earlier OPENXML sample since they assume no namespaces. You must specify the namespace-qualified name for each element/attribute in the WITH clause, as I just illustrated.

Q Is there any way to save the unprocessed portions of an XML document using OPENXML?

Q Is there any way to save the unprocessed portions of an XML document using OPENXML?

A SQL Server exposes several properties on each node in the XML document that you can access through a well-known namespace (urn:schemas-microsoft-com:xml-metaprop). These properties give you access to a given node's ID, local name stream, namespace URI, namespace prefix, the entire XML serialization, and information about parent and sibling nodes. SQL Server automatically binds the namespace to the "mp" prefix so you can use these metaproperties without any further configuration (as was necessary in the example in the previous question). Figure 5 lists the names of these properties along with their descriptions.

A SQL Server exposes several properties on each node in the XML document that you can access through a well-known namespace (urn:schemas-microsoft-com:xml-metaprop). These properties give you access to a given node's ID, local name stream, namespace URI, namespace prefix, the entire XML serialization, and information about parent and sibling nodes. SQL Server automatically binds the namespace to the "mp" prefix so you can use these metaproperties without any further configuration (as was necessary in the example in the previous question). Figure 5 lists the names of these properties along with their descriptions.

Figure 5 SQL Server 2000 Metaproperties

Metaproperty Attribute Description
@mp:id Provides system-generated, document-wide identifier of the DOM node (element, attribute, and so on). This ID is guaranteed to refer to the same XML node as long as the document is not reparsed. An XML ID of 0 indicates that the element is a root element. Its parent XML ID is NULL.
@mp:localname Stores the local part of the name of the node. It is used with prefix and namespace Uniform Resource Identifier (URI) to name element or attribute nodes.
@mp:namespaceuri Provides the namespace URI of the current element. If the value of this attribute is NULL, no namespace is present.
@mp:prefix Stores the namespace prefix of the current element name. If no prefix is present (NULL) and a URI is given, this indicates that the specified namespace is the default namespace. If no URI is given, no namespace is attached.
@mp:prev Stores the previous sibling relative to a node, thereby providing information about the ordering of elements in the document. @mp:prev contains the XML ID of the previous sibling that has the same parent element. If an element is at the beginning of the sibling list, @mp:prev is NULL.
@mp:xmltext This metaproperty is used for processing purposes. It's the textual serialization of the element and its attributes and subelements as used in the overflow handling of OPENXML.
@mp:parentid Corresponds to ../@mp:id.
@mp:parentlocalname Corresponds to ../@mp:localname.
@mp:parentnamespacerui Corresponds to ../@mp:namespaceuri.
@mp:parentprefix Corresponds to ../@mp:prefix.

The @mp:xmltext property provides the textual serialization of the node in question along with any attributes and descendant nodes. This metaproperty can be used for overflow handling. In fact, it can be used to capture the entire XML structure of the node or just the unprocessed portions. The following OPENXML statement illustrates how to provide an overflow column by mapping it to the @mp:xmltext property:

OPENXML(@iDoc, '/inv:Invoice/li:LineItems/li:LineItem', 2) WITH (Sku varchar(50) 'li:Sku', Description varchar(50) 'li:Description', Extra ntext '@mp:xmltext')

In this example, the entire serialized LineItem element will show up in the Extra column for each record. If you only want the unprocessed portions of each LineItem to show up, you should change the flag parameter from 2 to 10 (refer to Figure 2 for more details), as illustrated here:

OPENXML(@iDoc, '/inv:Invoice/li:LineItems/li:LineItem', 10) WITH (Sku varchar(50) 'li:Sku', Description varchar(50) 'li:Description', Extra ntext '@mp:xmltext')

With this in place, the Extra column will contain a string of XML that includes the Price element since it was the only node that wasn't mapped by the WITH clause.

The metaproperties listed in Figure 5 also appear when using OPENXML without a WITH clause, as illustrated here:

SELECT * FROM OPENXML(@iDoc, '/inv:Invoice/li:LineItems/li:LineItem', 2)

This produces what's called an edge table (see Figure 6), which contains a row for each node in the subtree as identified in the XPath expression. The rows contain columns for most of the metaproperties listed in Figure 5. This provides the information to reconstruct the original XML document from the rowset if necessary.

Figure 6 Edge Table for Invoice Document

id parentid nodetype localname prefix namespaceuri datatype prev text
7 5 1 LineItem NULL urn:items NULL NULL NULL
8 7 1 Sku NULL urn:items NULL NULL NULL
22 8 3 #text NULL NULL NULL NULL 134
9 7 1 Description NULL urn:items NULL 8 NULL
23 9 3 #text NULL NULL NULL NULL Mouse
10 7 1 Price NULL urn:items NULL 9 NULL
24 10 3 #text NULL NULL NULL NULL 9.95
11 7 1 Quantity NULL urn:items NULL 10 NULL
25 11 3 #text NULL NULL NULL NULL 35
12 5 1 LineItem NULL urn:items NULL 7 NULL
... ... ... ... ... ... ... ... ...

Q How do keys work in XSLT?

Q How do keys work in XSLT?

A Keys in XSLT are similar to IDs in Document Type Definitions (DTDs). If you define an attribute of type ID in a DTD, the attribute value must be unique in the document. You can also define attributes of type IDREF. In this case, the attribute value must refer to a unique ID value within the document. With this information, you can then use XPath's id function to resolve IDs and IDREF relationships in a document at run time. For example, consider an XML document with an internal DTD (see Figure 7).

A Keys in XSLT are similar to IDs in Document Type Definitions (DTDs). If you define an attribute of type ID in a DTD, the attribute value must be unique in the document. You can also define attributes of type IDREF. In this case, the attribute value must refer to a unique ID value within the document. With this information, you can then use XPath's id function to resolve IDs and IDREF relationships in a document at run time. For example, consider an XML document with an internal DTD (see Figure 7).

Figure 7 Internal DTD

<!DOCTYPE courses [ <!ELEMENT courses (course*, employees)> <!ELEMENT course EMPTY> <!ELEMENT employees (employee*)> <!ELEMENT employee EMPTY> <!ATTLIST course instructors IDREF #REQUIRED name CDATA #REQUIRED > <!ATTLIST employee id ID #REQUIRED name CDATA #REQUIRED > ]> <courses> <course instructors="e101 e102 e103 e104" name="EWS.NET"/> <course instructors="e101 e102 e105" name="GWS.NET"/> <employees> <employee id='e101' name="Aaron Skonnard"/> <employee id='e102' name="Simon Horrell"/> <employee id='e103' name="Dan Sullivan"/> <employee id='e104' name="Scott Bloom"/> <employee id='e105' name="Bob Beauchemin"/> </employees> </courses>

The following XPath expression identifies the element that has the ID of e104 which, in this case, happens to be the employee element that has the name of Scott Bloom:

id('e104')

The following XPath expression identifies the elements with id values of e101, e102, and e104 which, in this case, happens to be the employee elements with names of Aaron Skonnard, Dan Sullivan, and Scott Bloom:

id('e101 e103 e104')

And finally, the following XPath expression identifies the employees who teach the "GWS.NET" course (specifically Aaron Skonnard, Simon Horrell, and Bob Beauchemin) through the instructors IDREF attribute attached to them:

id(/courses/course[2]/@instructors)

As you can see, this technique lets you establish relationships between different elements based on ID values. The problem with the ID/IDREF technique is that it requires DTD information to be present at run time and it's limited to attributes. To help alleviate these restrictions, XSLT introduced a new concept, called keys, which provides the same basic functionality but with more flexibility. To define a key in an XSLT document, you use the key element, which has the following structure:

<xsl:key name="qname" match="pattern" use="expression"/>

You give the key a qualified name so that you can refer to it later from the key function (which behaves like the XPath id function described earlier). You also specify a pattern, which identifies the nodes to which the key applies. And finally, you rely on the "use" attribute to specify an XPath expression that is evaluated (relative to the nodes identified by the match expression) to produce the actual key value. For example, consider the XML document containing courses and employees shown in Figure 8.

Figure 8 An XML Document

<courses> <course> <name>EWS.NET</name> <instructors> <instructor>e101</instructor> </instructors> </course> <course> <name>GWS.NET</name> <instructors> <instructor>e101</instructor> <instructor>e102</instructor> <instructor>e103</instructor> </instructors> </course> <employees> <employee id='e101'> <name>Aaron Skonnard</name> </employee> <employee id='e102'> <name>Simon Horrell</name> </employee> <employee id='e103'> <name>Dan Sullivan</name> </employee> </employees> </courses>

In this case, you probably want to create a key based on the employee element's id attribute, as illustrated here:

<xsl:key name="employeeId" match="employee" use="@id"/>

This creates a key for all employee elements within the document; the key value is evaluated from the employee element's id attribute. With the key defined, you can then use the key function to retrieve nodes with a given key. For example, the following XSLT fragment illustrates how to retrieve the instructors who teach a given course:

<xsl:template match="course"> <h2><xsl:value-of select="name"/></h2> <h3>Instructors</h3> <ul> <xsl:for-each select="key('employeeId', ./instructors/*)"> <li><xsl:value-of select="."/></li> </xsl:for-each> </ul> </xsl:template>

That fragment will emit the following HTML, assuming the input document shown in Figure 8:

<h2>EWS.NET</h2> <h3>Instructors</h3> <ul> <li>Aaron Skonnard </li> </ul> <h2>GWS.NET</h2> <h3>Instructors</h3> <ul> <li>Aaron Skonnard</li> <li>Simon Horrell</li> <li>Dan Sullivan</li> </ul>

As you can see, keys don't require DTD information to be used and they're not limited to attributes as with ID/IDREF. The fact that keys can be assigned to nodes that match an arbitrary pattern also gives you an extra degree of flexibility.

Q What's the difference between the match and select attributes in XSLT? What do they do, exactly?

Q What's the difference between the match and select attributes in XSLT? What do they do, exactly?

A In XSLT, there are various elements that come with a match attribute (such as template and key) and others that come with a select attribute (value-of, for-each, apply-templates, and so on). At first glance, both seem to take XPath expressions, but in fact they're quite different.

A In XSLT, there are various elements that come with a match attribute (such as template and key) and others that come with a select attribute (value-of, for-each, apply-templates, and so on). At first glance, both seem to take XPath expressions, but in fact they're quite different.

Select indeed expects an XPath expression, which is used to select a nodeset for further processing. In the following example, the for-each expression identifies a set of nodes to be iterated over and the nested value-of expression writes the value of each node to the output tree:

<xsl:for-each select="//item"> <xsl:value-of select="."/> </xsl:for-each>

The match attribute, on the other hand, takes what's called a pattern. A pattern looks like an XPath expression because it shares the same syntax, but it's treated differently by the XSLT processor. A pattern is used for matching nodes in the tree against the specified criteria. For example, the following template matches any element node that has a grandparent named foo:

<xsl:template match="foo/*/*"> ••• </xsl:template>

In other words, a pattern describes an "is a" relationship with the nodes in question (that is, is this node an element that has a grandparent named foo?). Here's another slightly more sophisticated example of a pattern:

<xsl:template match="f:foo[@id > 323]//text()"> ••• </xsl:template>

In this case, the pattern identifies all text nodes that have an ancestor element named foo, from the namespace identified by f, and with an id attribute with a numeric value greater than 323.

Q Why does SelectNodes always return the nodes in document order irrespective of the axis direction (forward/reverse) that was used in the query?

Q Why does SelectNodes always return the nodes in document order irrespective of the axis direction (forward/reverse) that was used in the query?

A Consider the following XML document, which I'll call foo.xml:

<foo depth='1'> <foo depth='2'> <foo depth='3'> <bar/> </foo> </foo> </foo>

A Consider the following XML document, which I'll call foo.xml:

<foo depth='1'> <foo depth='2'> <foo depth='3'> <bar/> </foo> </foo> </foo>

Now consider the following code that evaluates slightly different XPath expressions against the loaded foo.xml document:

XmlDocument doc = new XmlDocument(); doc.Load("foo.xml"); XmlElement b, f1, f2; b = (XmlElement)doc.SelectSingleNode("//bar"); f1 = (XmlElement)b.SelectSingleNode("ancestor::foo[1]"); f2 = (XmlElement)b.SelectNodes("ancestor::foo")[0]; Console.WriteLine(f1.GetAttribute("depth")); Console.WriteLine(f2.GetAttribute("depth"));

You might expect the console output to be the same in both cases, but it's actually different. The first Console.WriteLine outputs "3" while the second outputs "1". The reason it works this way is that axis direction, as defined by the XPath specification, is only significant within XPath predicates. In fact, XPath nodesets are by definition unordered. It's up to the XPath implementation to determine in what order to deliver nodesets.

In the call to SelectSingleNode, however, I asked for the first node in the ancestor axis within a predicate. This guarantees that I'll get the first ancestor node (in reverse document order) starting from the bar element (the foo element with depth='3'). In the call to SelectNodes, on the other hand, I asked for all ancestor foo elements and the implementation returned an XmlNodeList collection in document order (which happens to be completely legal, according to the specification). Then, my code asked for the first item in the collection, which happens to be the foo element where depth='1'. Since XPath doesn't specify any particular order in which nodes are required to be returned, languages that build on XPath often define additional semantics of their own to take care of ordering. For example, in the case of XSLT, nodesets are always processed in document order.

Q How do you query a paragraph of text to determine whether it contains a <SCRIPT> element?

Q How do you query a paragraph of text to determine whether it contains a <SCRIPT> element?

A The answer to this question depends on what you mean when you ask if the paragraph contains a <SCRIPT> element. Consider the following document:

<item> <name>Something</name> <desc>This is a description that contains a <SCRIPT language="Javascript" src="script.js"/> element.</desc> </item>

If the document looks like this, you would use a normal XPath expression that queries against the document's logical data model. For example, the following XPath expression identifies all element nodes in the document tree that contain a child SCRIPT element (in this case, the <desc> element):

//*[SCRIPT]

A The answer to this question depends on what you mean when you ask if the paragraph contains a <SCRIPT> element. Consider the following document:

<item> <name>Something</name> <desc>This is a description that contains a <SCRIPT language="Javascript" src="script.js"/> element.</desc> </item>

If the document looks like this, you would use a normal XPath expression that queries against the document's logical data model. For example, the following XPath expression identifies all element nodes in the document tree that contain a child SCRIPT element (in this case, the <desc> element):

//*[SCRIPT]

Now, consider the following document, which uses special entity references for the less-than "<" and greater-than ">" characters within the <desc> element:

<item> <name>Something</name> <desc>This is a description that contains a &lt;SCRIPT language="Javascript" src="script.js"/&gt; element.</desc> </item>

When an XML processor parses this document, it replaces the "&lt;" string with the < character and the "&gt;" string with the > character. Hence, the <desc> element contains the following text when you query it using XPath:

This is a description that contains a <SCRIPT language="Javascript" src="script.js"/> element.

So you would use the following XPath expression to search this document for all elements that contain "<SCRIPT":

//*[contains(.,'<SCRIPT')]

Q I have two documents that only differ in namespace prefixes:

<person xmlns:a="urn:names"> <a:name>Bob</a:name> </person> <person xmlns:b="urn:names"> <b:name>Sally</b:name> </person>

Q I have two documents that only differ in namespace prefixes:

<person xmlns:a="urn:names"> <a:name>Bob</a:name> </person> <person xmlns:b="urn:names"> <b:name>Sally</b:name> </person>

It seems that most DOM APIs require the prefix when performing getElementsByTagName queries. How can I get a value for the name element without worrying about its prefix?

A When the DOM Level 1 specification was first published, it didn't include support for namespaces. Hence, the original getElementsByTagName method expected a simple tag name and was oblivious to namespaces. Some Level 1 implementations made it possible to process namespace-qualified documents using prefixed names, but you couldn't expect two implementations to produce the same results since the behavior wasn't defined by the spec.

A When the DOM Level 1 specification was first published, it didn't include support for namespaces. Hence, the original getElementsByTagName method expected a simple tag name and was oblivious to namespaces. Some Level 1 implementations made it possible to process namespace-qualified documents using prefixed names, but you couldn't expect two implementations to produce the same results since the behavior wasn't defined by the spec.

The DOM Level 2 specification (https://www.w3.org/DOM/DOMTR#dom2) introduced full support for namespaces by adding new methods for every operation dealing with element or attribute names (where namespaces are possible). The specification outlined that the new versions of these methods should all end in "NS". For example, in DOM Level 2 there is also a getElementsByTagNameNS method. This method expects a local name (with no prefix) and the namespace name.

Although this is what the specification says, not all implementations conform completely today. For example, the Microsoft® .NET Framework implementation provides an overloaded version of GetElementsByTagName instead of a completely separate method with a distinct name. Here's how you'd select the name element in C#:

XmlNodeList nl = doc.GetElementsByTagName("name", "urn:names"); string name = nl.Item(0).Value;

This code will select the name element from the urn:names namespace regardless of the prefix used in the source document.

The MSXML 4.0 DOM implementation doesn't include the NS methods either, but it does provide the functionality you need through other methods. For example, one way to select a node regardless of its prefix is through an XPath expression. Both the MSXML 4.0 and .NET Framework implementations support XPath through the SelectNodes and SelectSingleNode functions. The following JScript® illustrates how to select the name element based on local name alone, without regard to namespace or prefix:

name = doc.SelectSingleNode("/person/*[local-name()='name']").text;

Another way to use these functions with namespace-aware documents is to configure a separate namespace prefix that you intend to use in your XPath expressions. With MSXML 4.0, you do this through the call to setProperty, as illustrated here:

doc.setProperty("SelectionNamespaces", "xmlns:x='urn:names'"); name = doc.SelectSingleNode("/person/x:name").text;

The namespace prefix you use in the XPath expression doesn't have to be the same as the one used in the source document. In fact, it doesn't matter what prefix you use as long as it's mapped to the same namespace name as is used in the document. As you can see, there are a variety of ways to accomplish what you want depending on what's offered by your implementation.

Q I've always been confused about how xsl:decimal-format and format-number work in XSLT. Can you explain?

Q I've always been confused about how xsl:decimal-format and format-number work in XSLT. Can you explain?

A XSLT provides a built-in mechanism for formatting numbers. This mechanism is built on the format-number function and the xsl:decimal-format element. The format-number function has the following signature:

string format-number(number, string, string?)

The first argument is the number you want to format while the second argument is a string representing the exact format you want to use. The syntax of the format string is defined by the JDK 1.1 DecimalFormat class. You can use any of the characters shown in Figure 9 to build a format string.

A XSLT provides a built-in mechanism for formatting numbers. This mechanism is built on the format-number function and the xsl:decimal-format element. The format-number function has the following signature:

string format-number(number, string, string?)

The first argument is the number you want to format while the second argument is a string representing the exact format you want to use. The syntax of the format string is defined by the JDK 1.1 DecimalFormat class. You can use any of the characters shown in Figure 9 to build a format string.

Figure 9 Format String Characters

Character Meaning
0 A digit; zeros show as zeros
# A digit; zeros show as absent
. Placeholder for decimal separator (locale-specific)
, Placeholder for grouping separator (locale-specific)
- Default negative prefix
% Multiply by 100 and show as a percentage
; Separates number formats (positive format on left, optional negative number format on right)
' Used to quote special characters in a prefix or suffix
other Appears literally in output

For example, the XSLT transformation in Figure 10 illustrates how to use the format-number function with a few different format strings. Calling the printNumbers template and passing in the value 9950001 produces the following results:

$9,950,001 $9,950,001.00 $9,950,001.00

Figure 10 Format Number Function

<xsl:template name="printNumbers"> <xsl:param name="num"/> <xsl:value-of select="format-number($num, '$#,###.##')"/> <xsl:value-of select="format-number($num, '$#,##0.00')"/> <xsl:value-of select="format-number($num, '$#,##0.00;&lt;$#,##0.00&gt;')"/> </xsl:template> <xsl:template match="/"> <xsl:call-template name="printNumbers"> <xsl:with-param name="num" select="9950001"/> </xsl:call-template> </xsl:template>

The last format string has two parts: one part for positive numbers and another part for negative numbers (separated by a semicolon). When you supply a positive number to format-number, it uses the first format string; otherwise it uses the second format string. For example, calling printNumbers with a value of -9950001 produces the following results:

-$9,950,001 -$9,950,001.00 <$9,950,001.00>

If you don't specify a negative format string (like in the first two calls to format-number) a minus sign is automatically placed in front of the formatted number, as shown in the previous output.

The optional third argument is the name of a decimal-format element that influences how the format string is interpreted by the XSLT processor. If there isn't a third argument, the function uses the default decimal-format for the transformation (which follows the default format string rules).

Here's the syntax of the decimal format element:

<xsl:decimal-format name = qname decimal-separator = char grouping-separator = char infinity = string minus-sign = char NaN = string percent = char per-mille = char zero-digit = char digit = char pattern-separator = char />

As you can see, xsl:decimal-format lets you control exactly what the characters mean in a given format string. Figure 11 describes what each of the attributes means along with its default value. You name the decimal-format so that you can refer to it later while calling the format-number function. If you don't give it a name, it automatically becomes the default decimal format for the entire transformation.

Figure 11 xsl:decimal-format Attributes

Name Default Description
name "" The qualified name of the decimal-format; if there's no name, it becomes the default decimal-format
decimal-separator . The character used for the decimal sign
grouping-separator , The character used as the grouping separator
infinity Infinity The string used to represent infinity
minus-sign - The character used as the minus sign
NaN NaN The string used to represent the NaN value
percent % The character used as the percent sign
per-mille ‰ The character used as a per mille sign
zero-digit 0 The character used as the digit zero
digit # The character used for a digit in the format string
pattern-separator ; The character used to separate positive and negative sub patterns in a pattern

For example, consider the following xsl:decimal-format element that switches the decimal and grouping separator characters:

<xsl:decimal-format decimal-separator = "," grouping-separator = "."/>

Inserting this into the XSLT transformation in Figure 10 and calling the printNumbers template with -9950001 now produces the following output:

-$9.950.001 -$9.950.001,00 <$9.950.001,00>

Notice that since I didn't give the xsl:decimal-format element a name, it automatically became the default for the transformation and was used in the calls to format-number. The functionality that results from the coordination between format-number and xsl:decimal-format offers a flexible mechanism for formatting numbers throughout your transformations.

Send your questions and comments for Aaron to  xmlfiles@microsoft.com.

Aaron Skonnard is an instructor and researcher at DevelopMentor, where he develops the XML and Web Services-related curriculum. Aaron coauthored Essential XML Quick Reference (Addison-Wesley, 2001) and Essential XML (Addison-Wesley, 2000).