Building Practical Solutions with EXSLT.NET

 

Oleg Tkachenko
Microsoft Corporation

August 2004

Applies to:
   EXSLT.NET library
   XML programming
   XSLT

Summary: Oleg Tkachenko shows how to make XML programming easier and to boost your productivity using XSLT and XPath extensions, provided by the EXSLT.NET library. (22 printed pages)

Click here to download the code sample for this article.

Contents

Extending XPath and XSLT
EXSLT
The EXSLT.NET Library
EXSLT.NET Usage Patterns
The "Common" Module
The "Dates and Times" Module
The "Math" Module
The "Regular Expressions" Module
The "Sets" Module
The "Strings" Module
GotDotNet Modules
Conclusion
Acknowledgments

Extending XPath and XSLT

The XSLT language is an extremely powerful and popular tool for transforming XML documents, and the XPath language is an even more popular tool for querying XML. But as it usually happens, the first version of anything is rarely perfect. Based on four years of experience of a wide developer audience, it's clear now that XSLT 1.0 and XPath 1.0 lack some useful pieces of functionality, such as set operations, date and time manipulation functions, regular expressions, math functions, ability to produce multiple outputs, and others. These are quite reasonable restrictions for the very first version, and, as you can expect, the forthcoming second versions of the XSLT and XPath languages are going to bridge those gaps. Happily XSLT and XPath both were designed to be extensible (recall "X" stands for "eXtensible" in XML world), so bridging the gaps in XSLT and XPath has always been a matter of developing extensions.

XSLT defines two kinds of extensions—extension functions and extension elements. However, the XSLT Recommendation doesn't define a mechanism for providing extension implementations, instead leaving it up to XSLT processor vendors. Thus XSLT extensions are not portable between XSLT processors. For instance, one cannot expect the ms:format-date() extension function that is provided by MSXML4 to be available in MSXML3 or .NET (however, XSLT allows a stylesheet to determine whether a particular extension is supported by the XSLT processor that processes the stylesheet and fallback gracefully). Anyway, thousands of extension functions have been developed by XML developers for many XSLT processors, and finally it became clear that some sort of standardization was needed to avoid redeveloping extensions over and over again, and to overcome portability problems. That's what the main goals of the EXSLT community initiative are.

EXSLT is a community initiative to provide extensions to XSLT, and currently it defines more than 70 extension functions and elements. EXSLT provides benefits to the average XML developer in two different ways—first of all, it provides an extremely rich library of commonly needed functionality that is lacking in XSLT 1.0, and secondly it makes XSLT stylesheets more portable, at least between XSLT processors supporting EXSLT.

In this article, I'll talk about how to use the EXSLT.NET implementation to make you more productive with XSLT. You may want to read EXSLT: Enhancing the Power of XSLT and EXSLT Meets XPath, both by Dare Obasanjo, which give an introduction to the EXSLT.NET implementation. I'll show you how to enable EXSLT.NET in your development environment and how this plethora of additional functionality can be effectively used in XPath and XSLT programming.

EXSLT

The EXSLT specification defines a variety of extension functions and elements to cover commonly required functionality that is absent in XSLT 1.0 and XPath 1.0. EXSLT extensions are broken down into a number of modules. Each module defines a namespace URI, to which all extensions in the module belong. The EXSLT modules are listed in Table 1.

Table 1. EXSLT modules

Module name Module namespace Module description Supported by EXSLT.NET
Dates and Times https://exslt.org/dates-and-times Contains various date and time related extensions. Yes
Dynamic https://exslt.org/dynamic Contains extension functions that deal with the dynamic evaluation of XPath expressions. No
Common https://exslt.org/common Contains basic extension elements and functions. Yes
Functions https://exslt.org/functions Contains extensions that allow users to define their own extension functions. No
Math https://exslt.org/math Contains Math-related extension functions. Yes
Random https://exslt.org/random Contains extension functions that provide facilities that deal with randomness. No
Regular-Expressions https://exslt.org/regular-expressions Contains regular-expressions related extension functions. Yes
Sets https://exslt.org/sets Contains extension functions to enable set manipulations. Yes
Strings https://exslt.org/strings Contains extension functions to enable string manipulations. Yes

The EXSLT.NET Library

EXSLT.NET is a community-developed project led by Dare Obasanjo. The main goal of the project is to provide compliant, robust and effective implementation of EXSLT specification for the .NET platform.

Go visit the EXSLT.NET releases page and grab the most recent release (that's version 1.0.1 at the time of this writing, which is also provided along with this article). The EXSLT.NET release download is zip archive containing a precompiled assembly named GotDotNet.Exslt.dll, C# sources (a complete Microsoft Visual Studio .NET solution), documentation, and some rudimentary command-line utilities for testing (if you need real command-line XSLT utility, which supports EXSLT.NET, try nxslt.exe). Basically all you need to do is to add a reference to the GotDotNet.Exslt.dll assembly into your Visual Studio .NET project. Additionally, you can install the GotDotNet.Exslt.dll assembly into the Global Assembly Cache using the provided install.cmd script.

EXSLT.NET 1.0.1 implements fully the "Dates and Times," "Common," "Math," "Regular Expressions," and "Sets" modules, and almost fully the "Strings" module. A full list of implemented functions is provided in the EXSLT.NET documentation (online version). In addition, EXSLT.NET provides a set of proprietary extension functions, also grouped into modules:

Table 2. EXSLT.NET proprietary modules.

Module name Module namespace Module functions
GotDotNet Dates and Times https://gotdotnet.com/exslt/dates-and-times date2:avg(), date2:min() and date2:max()
GotDotNet Math https://gotdotnet.com/exslt/math math2:avg()
GotDotNet Regular Expressions https://gotdotnet.com/exslt/regular-expressions regexp2:tokenize()
GotDotNet Sets https://gotdotnet.com/exslt/sets set2:subset()
GotDotNet Strings https://gotdotnet.com/exslt/strings str2:uppercase() and

str2:lowercase()

The GotDotNet.Exslt namespace contains the following important classes:

Table 3. GotDotNet.Exslt classes

Class name Description
ExsltTransform A handy wrapper class, which encapsulates both the standard XslTransform class and the EXSLT modules implementation. It imitates the XslTransform API, thus allowing it to be used instead of the XslTransform class.
ExsltContext Implements XsltContext to allow EXSLT functions to be used in an XPath-only environment. See the EXSLT meets XPath article for more insights.
MultiXmlTextWriter Implements XmlWriter to enable multiple outputs from an XSL Transformation by means of the exsl:document extension element. See the Producing Multiple Outputs from an XSL Transformation article for more details.

Additionally, there is a set of classes that contain actual implementation of extension functions from EXSLT modules and proprietary EXSLT.NET modules. These classes are designed to be used as XSLT extension objects. They are building blocks of EXSLT.NET, and in common-usage scenarios you don't need to know about them, but they still can be quite useful for learning purposes as well as in more advances scenarios. For instance, you can use them as a base to build your own extension functions, which is actually a very reasonable approach considering how much effort the EXSLT.NET team has spent to make these classes highly efficient and robust. These classes are: ExsltCommon, ExsltDatesAndTimes, ExsltMath, ExsltRegularExpressions, ExsltSets, ExsltStrings, GDNDatesAndTimes, GDNMath, GDNRegularExpressions, GDNSets, and GDNStrings.

EXSLT.NET Usage Patterns

There are two distinct usage scenarios for which the EXSLT.NET library is meant to be used—in XSL Transformations and in XPath selections.

How to Use EXSLT.NET in XSLT

The most typical usage of EXSLT.NET is when you want to leverage EXSLT extension functions and elements in your XSLT stylesheet.

The usual steps to make use of an extension function in .NET are as follows:

  • Create an instance of an extension object.
  • Register it in an XsltArgumentList collection with an extension namespace URI.
  • Pass XsltArgumentList to the XslTransform.Transform() method.
  • Declare the extension namespace in the stylesheet (don't forget to mention its prefix in the exclude-result-prefixes attribute of the xsl:stylesheet element to avoid extension namespace propagation into the resulting document).
  • Call the extension functions (which are actually all public methods of the extension object).

The ExsltTransform class can handle the first three steps for you when it comes to EXSLT extension functions. Just use it instead of the standard XslTransform class, and then the only steps you have to accomplish include declaring the proper EXSLT module namespace in the stylesheet, and making a call of an extension function.

Here is the C# sample code that illustrates how to leverage the ExsltTransform class:

using System;
using System.Xml.XPath;
using GotDotNet.Exslt;

public class ExsltTest  
{
  public static void Main() 
  {                    
    ExsltTransform xslt = new ExsltTransform();              
    xslt.Load("foo.xsl");              
    xslt.Transform("foo.xml", "result.html");          
  }      
}

This code is conceptually equivalent to the following one (see, no magic here):

using System;
using System.IO;
using System.Xml.XPath;
using System.Xml.Xsl;
using GotDotNet.Exslt;

public class ExsltTest  
{
  public static void Main() 
  {                
    XPathDocument doc = new XPathDocument("foo.xml");
    XslTransform xslt = new XslTransform();        
    xslt.Load("foo.xsl");              
    XsltArgumentList args = new XsltArgumentList();
    args.AddExtensionObject("https://exslt.org/dates-and-times", 
      new ExsltDatesAndTimes());
    //... repeat above for all EXSLT and EXSLT.NET modules    
    using (FileStream fs = File.OpenWrite("result.html")) 
    {
      xslt.Transform(doc, args, fs);
    }
  }      
}

Then in the stylesheet you can call EXSLT functions as follows:

<xsl:stylesheet version="1.0" xmlns:xsl="https://www.w3.org/1999/XSL/Transform"
xmlns:date="https://exslt.org/dates-and-times" 
exclude-result-prefixes="date">
   <xsl:template match="/">
      Current date is <xsl:value-of select="date:date()"/>
   </xsl:template>
</xsl:stylesheet>

By default, the ExsltTransform class enables all EXSLT and EXSLT.NET modules, and it turns off the multiple output feature (support for the exsl:document extension element). This behavior can be controlled via SupportedFunctions and MultiOutput properties:

ExsltTransform xslt = new ExsltTransform();
//Turn on only "Common" and "Dates and Times" modules
xslt.SupportedFunctions = ExsltFunctionNamespace.Common |    
    ExsltFunctionNamespace.DatesAndTimes;
//Turn on support for multiple output
xslt.MultiOutput = true;

How to Use EXSLT.NET in XPath

It's also possible to leverage EXSLT.NET extensions in an XPath-only environment. Dare has sorted out this issue well in the EXSLT meets XPath article. Here is only a basic sample of how you can select distinct values from an XmlDocument using the set:distinct() extension function:

XmlDocument doc = new XmlDocument();        
doc.Load("foo.xml");
XPathNavigator nav = doc.CreateNavigator();                
XPathExpression expr = nav.Compile("set:distinct(//@country)");
expr.SetContext(new ExsltContext(doc.NameTable));
XPathNodeIterator ni = nav.Select(expr);
while (ni.MoveNext()) {
    Console.WriteLine(ni.Current.Value);
}

Now let's examine EXSLT modules and functions more closely.

The "Common" Module

The Common module covers common basic extensions—the exsl:document extension element, and the exsl:node-set() and exsl:object-type() functions.

exsl:node-set()

The exsl:node-set() function is the most commonly used extension function. It's a portable analog of the msxsl:node-set() function, supported by MSXML and XslTransform, as well as similar functions that all other XSLT processors possess to convert a result tree fragment into a nodeset.

More formally, the exsl:node-set() function serves two needs—first, it converts a result-tree fragment (that's what you get when you use the xsl:variable's content instead of the select attribute to assign a value to a variable) into a nodeset, enabling multi-step processing (you can then iterate over temporary trees, created on the fly during the transformation). The second, less well-known, and often confusing, ability of the exsl:node-set() function is to convert a string into a text node. It could be useful, for instance, when you want to pass a string to a function or template that only accepts nodeset as a parameter. By no means, however, can this function convert a string containing an XML fragment into a nodeset representing this XML, as many mistakenly believe; instead, it only converts any given string into a single text node whose value is that string.

Multi-step transformation is an extremely useful technique, which allows the function to express complex XML transformations as a chain of simple transformations within a single transformation run. For instance, when you need to access previous or next nodes in sorted order, or to add calculated values, it's much easier to create a temporary tree (result tree fragment), convert it to a nodeset, and then process it. Here is such a sample:

Input XML:

<items>
   <item price="9.99" quantity="30">Screwdriver</item>
   <item price="29.99" quantity="10">Handsaw</item>
   <item price="49.99" quantity="15">Electric drill</item>
</items>

Then to calculate the total price for all of the items in the inventory you can use the following technique:

<xsl:stylesheet version="1.0" xmlns:xsl="https://www.w3.org/1999/XSL/Transform"
xmlns:exsl="https://exslt.org/common" exclude-result-prefixes="exsl">      
   <xsl:template match="/">
      <xsl:variable name="totals-rtf">
         <xsl:for-each select="/items/item">
            <t><xsl:value-of select="@price*@quantity"/></t>
         </xsl:for-each>
      </xsl:variable>
      <xsl:variable name="totals" select="exsl:node-set($totals-rtf)/t"/>
      Total: <xsl:value-of select="format-number(sum($totals), '##0.00')"/>
   </xsl:template>
</xsl:stylesheet>

exsl:document

The exsl:document extension element is used to create multiple result documents in a single XSL transformation. The topic has been covered in great detail in the Producing Multiple Outputs from an XSL Transformation article, and after the article was published the MultiXmlTextWriter implementation has been incorporated into the EXSLT.NET project. Due to some implications discussed in the aforementioned article, support for the exsl:document element is turned off by default and can be turned on by setting the MultiOutput property of the ExsltTransform class to true:

XPathDocument doc = new XPathDocument("foo.xml");
ExsltTransform xslt = new ExsltTransform();
xslt.MultiOutput = true;
xslt.Load("foo.xsl");     
using (FileStream fs = File.Create("result.html")) {   
    xslt.Transform(doc, null, fs);
}

Note   The exsl:document element is not supported when transformation is done to XmlReader or XmlWriter. In the latter case use the overloaded ExsltTransform.Transform() method, which accepts instance of the MultiXmlTextWriter class to transform to.

The "Dates and Times" Module

The Dates and Times module contains a bulk of extension functions to deal with date and time values. These include functions for getting current date or time, manipulation of dates, calculating dates, and parsing and formatting date and time values.

Most of these functions operate with date-time and duration values, defined in the W3C XML Schema—xs:dateTime, xs:date, xs:time and xs:duration.

The date:date-time(), date:date(), and date:time() functions return current date and time values. The function date:difference() is used to calculate a time duration between two date-time values. The date:add() function allows you to add a duration of time to a date-time value (for example, to calculate a date 45 days after today). The date:sum() function allows you to add durations of time.

There are too many functions to be listed here; I encourage readers to take a look at the full list and go through the documentation of each one. Knowing which functions are available may greatly increase your XSLT programming productivity. I only want to provide a couple of samples here to illustrate how these functions can be used to solve common XSLT problems.

The first common problem is parsing date-time values stored in various non-standard arbitrary formats such as 5.12.2003 09:04 AM. The opposite problem is formatting date-time values for final presentation. MSXML4 provided two great extension functions, ms:format-date() and ms:format-time() to aim at the latter problem, but they are not supported in .NET or MSXML3. EXSLT solves these problems with the date:parse-date() and date:format-date() extension functions. Here is a sample of how to use these functions:

Source XML:

<log>
   <event timestamp="1.9.2003 07:22 AM" description="Server startup"/>
   <event timestamp="5.12.2003 10:04 PM" description="Server shutdown"/>
</log>

The stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="https://www.w3.org/1999/XSL/Transform"
xmlns:date="https://exslt.org/dates-and-times" exclude-result-prefixes="date">   
   <xsl:template match="/">
      <table border="1">
         <tr>
            <th>Date</th><th>Time</th><th>Event</th>
            <xsl:apply-templates/>
         </tr>      
      </table>


   </xsl:template>
   <xsl:template match="event">
      <xsl:variable name="date-time" 
         select="date:parse-date(@timestamp, 'd.M.yyyy hh:mm tt')"/>
      <tr>
         <td>
          <xsl:value-of 
          select="date:format-date($date-time, 'MMMM d, yyyy')"/>
         </td>
         <td>
          <xsl:value-of 
          select="date:format-date($date-time, 'HH:mm:ss')"/>
         </td>
         <td><xsl:value-of select="@description"/></td>
      </tr>
   </xsl:template>
</xsl:stylesheet>

And this is what the result looks like in a browser:

Date Time Event
September 1, 2003 07:22:00 Server startup
December 5, 2003 22:04:00 Server shutdown

Another frequently requested functionality is date-time arithmetic; for example, calculating the time remaining until some event, selecting log events that occurred during the last three months, or calculating which day will be 45 days after today. Consider the following auction site catalog document:

<auction-list>
   <item name="Acme screwdriver" 
      placed="2003-12-22T21:40:54" 
      listing="P7D" starting-bid="5.99"/>
   <item name="Acme hammer" 
      placed="2003-11-02T22:09:43" 
      listing="P14D" starting-bid="4.99"/>   
</auction-list>

Then let's say I want to list unexpired items along with the date the auction ends and the time left. Here is the stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="https://www.w3.org/1999/XSL/Transform"
xmlns:date="https://exslt.org/dates-and-times" exclude-result-prefixes="date">   
   <xsl:template match="auction-list">
      <html>
         <body>
            <!-- List only unexpired items -->
            <xsl:apply-templates
 select="item[date:seconds(date:difference(date:date-time(), date:add(@placed, @listing))) > 0]"/>
         </body>
      </html>      
   </xsl:template>
   <xsl:template match="item">
      <xsl:variable name="ends" select="date:add(@placed, @listing)"/>
      <table border="1">
         <tr>
            <td colspan="2"><b><xsl:value-of select="@name"/></b></td>
         </tr>
         <tr>
            <td>Starting bid:</td>
            <td>$<xsl:value-of select="@starting-bid"/></td>
         </tr>
         <tr>
            <td>Time left:</td>
            <td>
   <xsl:variable 
     name="seconds-left" 
     select="date:seconds(date:difference(date:date-time(), $ends))"/>
   <xsl:variable 
     name="days-left" 
     select="floor($seconds-left div (3600*24))"/>
   <xsl:variable 
     name="hours-left" 
     select="floor(($seconds-left div 3600)—($days-left*24))"/>
   <xsl:value-of 
     select="concat($days-left, ' days ', $hours-left, ' hours')"/>               
            </td>
         </tr>
         <tr>
            <td>Ends:</td>
            <td>
            <xsl:value-of 
             select="date:format-date($ends, 'MMM-dd-yy hh:mm:ss')"/>
           </td>
         </tr>
      </table>


   </xsl:template>
</xsl:stylesheet>

And here is what the result looks like in a browser at December 22, 2003 9:52PM:

Acme screwdriver
Starting bid: $5.99
Time left: 6 days 23 hours
Ends: Dec-29-03 09:40:54

The "Math" Module

The Math module covers Math-related extension functions, such as math:cos(), math:sqrt(), math:max(), math:min(), and others. The latter two, along with math:highest() and math:lowest(), effectively facilitate the recurring problem of selecting maximal or minimal values. Here is a small sample:

<items>
   <item price="9.99" quantity="30">Screwdriver</item>
   <item price="29.99" quantity="10">Handsaw</item>
   <item price="49.99" quantity="15">Electric drill</item>
</items>

Then let's output the most expensive items and the cheapest items:

<xsl:stylesheet version="1.0" xmlns:xsl="https://www.w3.org/1999/XSL/Transform"
xmlns:math="https://exslt.org/math" exclude-result-prefixes="math">   
   <xsl:template match="items">
      The most expensive item is 
<xsl:value-of select="math:highest(item/@price)/.."/>
      The most cheapest item is 
<xsl:value-of select="math:lowest(item/@price)/.."/>            
      The highest price is <xsl:value-of select="math:max(item/@price)"/>
      The lowest price is <xsl:value-of select="math:min(item/@price)"/>
   </xsl:template>
</xsl:stylesheet>

This module also contains the interesting function math:random(), which returns a random double number from 0 to 1. Be aware, though, that a generated number is not actually a mathematically random one, because the math:random() function doesn't accept and return a seed. The EXSLT.NET implementation of this function is based on the System.Random class, constructed with the DateTime.Now.Ticks value as a seed, so obviously on modern fast computers that doesn't allow you to generate different random number sequences. The safest way to use this function, then, is to call it once during a transformation; for example, to select random item from a list of items:

<xsl:stylesheet version="1.0" xmlns:xsl="https://www.w3.org/1999/XSL/Transform"
xmlns:math="https://exslt.org/math" exclude-result-prefixes="math">   
   <xsl:template match="/items">
      Sale of the hour—10% discount on 
<xsl:value-of 
 select="item[position() =
         (1 + floor(math:random()*count(/items/item)))]"/>
   </xsl:template>
</xsl:stylesheet>

The "Regular Expressions" Module

The Regular Expressions module currently contains three extension functions: regexp:test(), regexp:replace(), and regexp:match(), which allow you to perform sophisticated string processing operations, such as matching a string against a pattern, replacing matched substrings, or even string parsing. Everybody who is addicted to regular expressions should be happy now.

Here is how regexp testing can be used to validate an e-mail address:

<xsl:if 
 test="not(regexp:test(email, 
      '\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*'))">
 Email address is not valid.
</xsl:if>

Having the power of regular expressions in your hands, you can easily do many things you cannot do in pure XSLT, such as matching elements by a regular expression pattern:

<xsl:template match="*[regexp:test(name(), '[Ff]ield(\d)+')]">

The above template matches all elements whose name starts with field or Field followed by a number.

Or another sample—here is how you can remove HTML markup from user input:

<xsl:value-of select="regexp:replace(text, '&lt;[^>]*>', 'g', '')"/>

The regexp:match() function enables you to parse strings easily. For instance, I have badly structured log information, such as

<log>
    <entry>22/12/2003    21:00    AcmeService    DB updated</entry>
</log>

and I need to parse the individual values of each event—date, time, application name, and message. Here is how it can be done:

<xsl:stylesheet version="1.0" 
 xmlns:xsl="https://www.w3.org/1999/XSL/Transform"
 xmlns:regexp="https://exslt.org/regular-expressions"
 exclude-result-prefixes="regexp">
   <xsl:template match="log/entry">
      <xsl:variable name="tokens" select="regexp:match
     (., '(\d{1,2}/\d{1,2}/\d{4})\s+(\d{2}:\d{2})\s+(\w*)\s+(.*)', 'g')"/>
      <entry date="{$tokens[2]}" time="{$tokens[3]}" application="{$tokens[4]}" message="{$tokens[5]}"/>
   </xsl:template>
</xsl:stylesheet>

The result is

<entry date="22/12/2003" 
  time="21:00" application="AcmeService"
  message="DB updated" />

The "Sets" Module

This module provides extension functions to facilitate various set manipulation operations, such as difference, intersection, filtering distinct nodes, getting the leading or trailing subset, or checking if two nodesets contain the same node.

The set:difference() function is a handy way to find the difference between two nodesets; that is, nodes that are in the first set, but are not in the second one. The set:intersection() function returns the nodesets intersection; that is, nodes that are in both nodesets. Such functionality is usually needed when filtering nodes according to some complex business logic, for example to select all parts in the inventory nodeset, which are not in the discounted parts nodeset.

As a matter of interest, nodeset difference can be also calculated in pure XPath using the $ns1[count(.|$ns2) != count($ns2)] formulae and intersection—using $ns1[count(.|$ns2)=count($ns2)], but also using EXSLT extension functions, can be a great deal more effective.

I must mention the set:distinct() function too. This function is aimed to facilitate the extremely common, though surprisingly hard to solve, problem of selecting distinct nodes, be it selecting distinct authors from a bibliography or selecting unique parts from and order list. The set:distinct() function returns a subset of the specified nodeset, in which all nodes have distinct string values.

Here is a small sample of getting unique products by product name from a list of orders:

<orders>
    <order date="2003-12-12">
        <product>Dress Shirt</product>
        <size>M</size>
        <color>Blue</color>
        <quantity>10</quantity>
    </order>
    <order date="2003-12-20">
        <product>Dress Shirt</product>
        <size>XL</size>
        <color>White</color>
        <quantity>5</quantity>
    </order>
    <order date="2003-12-25">
        <product>Geeky TShirt</product>
        <size>XL</size>
        <color>Black</color>
        <quantity>30</quantity>
    </order>        
</orders>

Then to uniquely list ordered products, you can use the following code:

<xsl:stylesheet version="1.0" xmlns:xsl="https://www.w3.org/1999/XSL/Transform"
xmlns:set="https://exslt.org/sets" exclude-result-prefixes="set">
    <xsl:template match="orders">
        <h3>Ordered products:</h3>
        <xsl:for-each select="set:distinct(order/product)">
            <div><xsl:value-of select="."/></div>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

The result is

<h3>Ordered products:</h3>
<div>Dress Shirt</div>
<div>Geeky TShirt</div>

Having unique products, now you can also easily group orders by product name. In fact, this is just another kind of Muenchian grouping method, in which the selection of unique nodes is done in a more natural, and in some situations more effective, way.

The "Strings" Module

The last EXST module supported by EXSLT.NET is "Strings." Obviously it's about string manipulations. Functions from this module allow you to align, pad, split, and tokenize strings, as well as to concatenate nodeset values and replace substrings in a string. Trivial, but extremely useful in mundane text-processing functions.

Here is how you can mask e-mail addresses on your site to protect users from being spammed:

 <xsl:value-of select="str:replace(email, '@', '@NOSPAM')"/>

Another sample—padding a string. If you need to output some value and make sure it occupies a fixed amount of space (that's usually needed when creating reports), you can use the str:padding function to create a padding string of fixed length:

<xsl:value-of select="@name"/>
<xsl:value-of select="str:padding(20—string-length(@name), '&#xA0;')"/>

The above outputs name and then pads it with non-breakable space characters, so the total length is exactly 20 characters long.

The str:align function allows you to align a string (left, right, center) within another string:

<xsl:value-of select="str:align(@name, str:padding(20, '_'), 'center')"/>

This expression aligns a name attribute value within a string of 20 underscore characters, to produce something like

_______Danny________

str:split and str:tokenize are quite similar functions that are used to break strings into tokens. The former splits a string into tokens by the string pattern, while the latter one does it by any of the specified delimiting characters. Both functions return <token> elements for each identified token:

str:split('cats and dogs', ' and ')

returns

<token>cats</token>
<token>dogs</token>

And str:tokenize('2001-06-03T11:40:23', '-T:')

returns

<token>2001</token>
<token>06</token>
<token>03</token>
<token>11</token>
<token>40</token>
<token>23</token>

You can process returned tokens as usual; for example, iterate over each one in turn using the xsl:for-each instruction to create a list:

<ul>        
    <xsl:for-each select="str:split('cats and dogs', ' and ')">
        <li><xsl:value-of select="."/></li>
    </xsl:for-each>            
</ul>

The tokenizing of a string is a highly useful technique. It allows you to split a string into a set of tokens (usually the token is a single word) and manipulate tokens as appropriate: you can modify some of them (think of transforming URLs into links), break long string into lines of fixed length, mask e-mail addresses, or auto-discover some keywords.

Here is a sample of how to use the str:tokenize function along with a matching regular expression to make URLs and e-mail addresses linkable in some plain text. Say you've got a parts inventory, in which description elements may contain URLs or e-mail addresses:

<parts>
   <part partID="ip334" name="Acme screwdriver">      
      <desc>Some description text with links, such as 
       https://www.contoso.com/parts/p334.html and email 
       addresses such as sales@contoso.com within.
     </desc>
   </part>
</parts>

The idea is to split the description into words and iterate over them, transforming each one that matches a URL or an e-mail pattern to an HTML link, and outputting others as is:

<xsl:stylesheet version="1.0" 
 xmlns:xsl="https://www.w3.org/1999/XSL/Transform"
 xmlns:str="https://exslt.org/strings"
 xmlns:regexp="https://exslt.org/regular-expressions"
 exclude-result-prefixes="str regexp">
   <xsl:variable 
     name="url-pattern" 
     select="'https://([\w-]+\.)+[\w-]+(/[\w- ./?%&amp;=]*)?'"/>
   <xsl:variable 
     name="email-pattern" 
     select="'\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*'"/>
   <xsl:template match="part">
      <h4><xsl:value-of select="@name"/></h4>
      <div style="border:1px dotted red">
         <xsl:for-each select="str:tokenize(.)">
            <xsl:choose>
               <xsl:when test="regexp:test(., $url-pattern)">
                  <a href="{.}"><xsl:value-of select="."/></a>
               </xsl:when>
               <xsl:when test="regexp:test(., $email-pattern)">
                  <a href="mailto:{.}"><xsl:value-of select="."/></a>
               </xsl:when>
               <xsl:otherwise>
                  <xsl:value-of select="."/>
               </xsl:otherwise>
            </xsl:choose>
            <xsl:if test="position() != last()">
               <xsl:text> </xsl:text>
            </xsl:if>
         </xsl:for-each>
      </div>
   </xsl:template>
</xsl:stylesheet>

Here is the result of applying the above stylesheet to a sample inventory XML document:

Figure 1. Auto-discovering URLs and e-mail addresses in plain text

Finally, the str:concat function allows you to concatenate string values of each node in a specified nodeset.

GotDotNet Modules

In addition to EXSLT extension functions, EXSLT.NET provides a set of proprietary functions (see Table 2). They belong to different namespaces and we use different prefixes to distinguish them from EXSLT functions. Commonly used prefixes for these functions ends with 2 (date2, regexp2, and so on). Let's briefly discuss them.

The GotDotNet Dates and Times Module

There are currently three functions in this module, namely date2:min(), date2:max, and date2:avg(), which facilitate the calculation of minimum, maximum, and average time duration values (The xs:duration XML Schema data type).

The GotDotNet Math Module

Currently a single function in this module—math2:avg() allows to calculate average number value.

The GotDotNet Regular Expressions Module

The only function in this module currently is the regexp2:tokenize() function, which extends the functionality of the str:tokenize() function by adding the ability to tokenize a string at the positions that are defined by a regular expression pattern (an analog of the Regex.Split() method of the .NET Framework).

The GotDotNet Sets Module

The only function in this module currently is the set2:subset() function, which allows you to determine if a nodeset A is a subset of a nodeset B (all nodes of nodeset A are also contained in nodeset B).

The GotDotNet Strings Module

This module currently contains two functions, which allow you to convert string value to upper or lower case: str2:uppercase() and str2:lowercase(). The conversion is done using the casing rules of the current culture. These functions are aimed to replace the ugly i18n-unfriendly translate() function trick, which is commonly used in XSLT to convert character cases. Since they are implemented using culture-aware .NET Framework classes, the str2:uppercase() and str2:lowercase() functions represent a much more robust and safer solution to change character cases.

Conclusion

As has been shown, the EXSLT.NET library provides an extremely rich set of extension functions and elements aimed to make XSLT programming easier and more productive. These functions fill the omissions of functionality in XPath 1.0 and XSLT 1.0, allowing developers to concentrate on their actual XML querying/transformation logics, instead of implementing trivial string or date manipulation functions over and over.

EXSLT extensions were designed within the XSLT community to facilitate everyday practical tasks for which XPath and XSLT don't provide a solution and to allow these extensions to be portable between XSLT processors that support EXSLT.

The EXSLT.NET library brings EXSLT to .NET platform, implementing the vast majority of EXSLT extension functions and operators. EXSLT.NET has been seriously optimized for performance and usability and is the recommended tool for all XML developers who are working on the .NET Framework platform.

Acknowledgments

I'd like to thank Dare Obasanjo for starting the EXSLT.NET project and for his help in preparing this article. Thanks also to all EXSLT supporters and members of EXSLT.NET developer community, especially Dimitre Novatchev and Paul Reid.

Feel free to raise any EXSLT related questions at the EXSLT.NET project's message board.