XslCompiledTransform Slower than XslTransform?

This post discusses:

  • Why XslCompiledTransform may be slower than XslTransform
  • How to reduce start-up time if you use one of the managed XSLT processors
  • Why it is important to cache loaded XslCompiledTransform instances

The .NET Framework 2.0 provides a new System.Xml.Xsl.XslCompiledTransform XSLT processor class, which is intended to replace the obsoleted XslTransform class. One of the major differences between the two is that while the latter is an XSLT interpreter, the former is a real XSLT compiler, allowing significantly faster execution times. Does it mean XslCompiledTransform is always faster? Surprisingly, the answer is not that simple.

Let's write a simple test application that measures Load and Transform times for both XslTransform and XslCompiledTransform processors. Here is the most interesting part of the code, and the full source code is available in the attached file.

 private void TestXslCompiledTransform() {
    XslCompiledTransform xslt = null;

    for (int i = 0; i < numberOfIterations; i++) {
        Stopwatch stopwatch = Stopwatch.StartNew();
        xslt = new XslCompiledTransform();
        xslt.Load(xslFile);
        stopwatch.Stop();
        Console.WriteLine("Load time: {0} ms", FormatTime(stopwatch));
    }

    Console.WriteLine("------------------------");
    XPathDocument doc = new XPathDocument(xmlFile);

    for (int i = 0; i < numberOfIterations; i++) {
        Stopwatch stopwatch = Stopwatch.StartNew();
        xslt.Transform(doc, (XsltArgumentList)null, XmlWriter.Create(TextWriter.Null, xslt.OutputSettings));
        stopwatch.Stop();
        Console.WriteLine("Transform time: {0} ms", FormatTime(stopwatch));
    }
}

Note that both Load and Transform are executed multiple times in a loop, and their times are measured separately. Also we pre-load the input document and output results of the transformation to TextWriter.Null, so that file input/output operations are not taken into account. (I accidentally forgot to pre-load the stylesheet in memory, however that did not make a noticeable difference on the results obtained below, because I ran XsltPerf with the same stylesheet multiple times, and the stylesheet file was sitting in the disk cache after the first run.) If you are new to the XslCompiledTransform class and wondering what xslt.OutputSettings is doing in this snippet, you may find the answer in Erik Saltwell's post "What the heck is OutputSettings".

The application allows you to specify filenames of the input document and the XSLT stylesheet, the number of iterations, and which XSLT processor to use:

 C:\XsltPerf>XsltPerf.exe /?
XSLT Load & Transform Performance Test Utility
for Microsoft (R) Windows (R) 2005 Framework version 2.0.50727

XsltPerf [/xt | /xct] [/i:<n>] <xml-file> <xsl-file>

Options:

/xt         Use XslTransform
/xct        Use XslCompiledTransform (default)
/i:<n>      Iterate n times (default is 5)

For testing purposes, we take one of XSLTMark benchmark stylesheets, namely queens.xsl, which finds all the possible solutions to the problem of placing N queens on an N×N chess board without any queen attacking another. XSLTMark uses N = 6, and so will we. Let's run the XsltPerf utility several times before taking readings to ensure the measurements take place on a "warm" machine:

 C:\XsltPerf>XsltPerf.exe /xt queens.xml queens.xsl

The results for my Intel® Xeon® 3GHz box shows that the very first Load in a process is very slow for XslTransform and incredibly slow in case of XslCompiledTransform. The very first Transform is also significantly slower than subsequent ones. If you sum up the time for the first Load and the time for the first Transform, you'll get 239 ms for XslTransform versus 928 ms for XslCompiledTransform. In this particular scenario the new XslCompiledTransform class is almost 4 times slower!

XslTransform XslCompiledTransform
Load time: 90.69 msLoad time: 2.847 msLoad time: 1.841 msLoad time: 2.681 msLoad time: 1.891 ms
Transform time: 148.0 msTransform time: 76.57 msTransform time: 73.82 msTransform time: 74.49 msTransform time: 74.22 ms
Load time: 882.2 msLoad time: 4.582 msLoad time: 3.060 msLoad time: 3.119 msLoad time: 3.072 ms
Transform time: 45.31 msTransform time: 2.027 msTransform time: 2.027 msTransform time: 1.962 msTransform time: 1.977 ms

So what the hell happens on the first call? In the .NET Framework 2.0 implementations of XslTransform and XslCompiledTransform were moved to a helper System.Data.SqlXml assembly in order to reduce the size of System.Xml. And apparently that helper assembly is not NGen'd by default, so its methods are JIT-compiled on first use:

 C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727>ngen display System.Data.SqlXml
Microsoft (R) CLR Native Image Generator - Version 2.0.50727.42
Copyright (C) Microsoft Corporation 1998-2002. All rights reserved.
Error: The specified assembly is not installed.

(If you don't know what NGen or JIT is, I highly recommend reading the "NGen Revs Up Your Performance with Powerful New Features" article.) JIT-compilation affects the first XslCompiledTransform.Load call much more significantly comparing to XslTransform.Load, because the compiler uses substantially more complex code than the interpreter.

Considering that other key .NET Framework assemblies are NGen'd, missing of System.Data.SqlXml in the native image cache may be a simple oversight on the part of .NET Framework installer. Let's generate a native image for the System.Data.SqlXml assembly:

 C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727>ngen install "System.Data.SqlXml, Version=2.0.0.0,
 Culture=neutral, PublicKeyToken=b77a5c561934e089" /nologo
Installing assembly System.Data.SqlXml, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
Compiling 1 assembly:
    Compiling assembly System.Data.SqlXml, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089 ...
System.Data.SqlXml, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089

and run our test application again. I have copied the previous results to make it's easier for you to compare:

XslTransform XslCompiledTransform
System.Data.SqlXml is not NGen'd
Load time: 90.69 msLoad time: 2.847 msLoad time: 1.841 msLoad time: 2.681 msLoad time: 1.891 ms
Transform time: 148.0 msTransform time: 76.57 msTransform time: 73.82 msTransform time: 74.49 msTransform time: 74.22 ms
Load time: 882.2 msLoad time: 4.582 msLoad time: 3.060 msLoad time: 3.119 msLoad time: 3.072 ms
Transform time: 45.31 msTransform time: 2.027 msTransform time: 2.027 msTransform time: 1.962 msTransform time: 1.977 ms
System.Data.SqlXml is NGen'd
Load time: 19.27 msLoad time: 2.894 msLoad time: 1.895 msLoad time: 2.753 msLoad time: 1.911 ms
Transform time: 77.22 msTransform time: 75.93 msTransform time: 73.56 msTransform time: 74.49 msTransform time: 73.78 ms
Load time: 58.57 msLoad time: 4.862 msLoad time: 3.238 msLoad time: 3.280 msLoad time: 3.272 ms
Transform time: 15.23 msTransform time: 1.918 msTransform time: 1.921 msTransform time: 1.949 msTransform time: 1.926 ms

As you can see, the first Load call became much faster, though it still consumes some extra time to load the helper assembly into process memory and initialize all needed classes. Now XslCompiledTransform.Load is "only" 3 times slower than XslTransform.Load; however, this is expected — compiling, in general, is more expensive than the interpreter preparation work. It is the price you have to pay for faster execution: XslCompiledTransform.Transform, except the first call, is about 40 times faster!

If you look at new Transform times, you may note that the difference between the first and subsequent calls to XslTransform.Transform, thanks to NGen'ing, almost disappeared. But why is the first call to XslCompiledTransform.Transform still 8 times slower?! Here we need to recall that XslCompiledTransform compiles the stylesheet to MSIL methods. All those generated methods are subject to JIT-compiling on first use. While JIT-compilation is relatively expensive, its cost is usually amortized over several executions of compiled methods. For example, in our scenario the first XslCompiledTransform.Transform call is still faster than XslTransform.Transform one. However, in case of very simple stylesheets like <xsl:template match="/"><foo/></xsl:template> the first XslCompiledTransform.Transform call may perform several times worse.

I'd like to emphasize the principal difference between the "first Load" issue (relates to both XslTransform and XslCompiledTransform) and the "first Transform" issue (relates to XslCompiledTransform only). In the former case, the first Load per process1 is affected. In the latter one, the first Transform per loaded stylesheet is affected. If you load a different stylesheet or even reload the same stylesheet into the same XslCompiledTransform instance, a new bunch of MSIL methods will be generated and JIT-compiled on their first use.

1 Or per AppDomain in case of multi-AppDomain applications. For more information on sharing native images across AppDomains, you may read this and this.

Let's make some conclusions of this little experiment:

  • If you are using one of the .NET Framework 2.0 XSLT processors (XslTransform or XslCompiledTransform), NGen'ing the System.Data.SqlXml assembly may improve the application start-up time. This does not relate to the .NET Framework 1.1, which has the XslTransform implementation NGen'd by default.
  • XslCompiledTransform.Load is slower than XslTransform.Load, though it's unlikely to become a concern.
  • XslCompiledTransform.Transform performance may degrade by several times due to JIT-compilation. JIT-compilation happens on first use of a stylesheet template, and in many cases it would be the very first XslCompiledTransform.Transform call for the given stylesheet loaded into an XslCompiledTransform instance that is affected the most.
  • While XslCompiledTransform is a best choice for the "one Load, many Transforms" scenario, it may be slow for the "one Load, one Transform" scenario, especially for very simple XML/XSLT files and when stylesheet templates are executed only few times. In those cases XslTransform may be faster.
  • It is important, especially for server applications, to cache an XslCompiledTransform instance if the same stylesheet is likely to be executed again.

XsltPerf.zip