XSLTC — Compile XSLT to .NET Assembly

In my two previous posts I described a potential performance hit caused by XSLT-to-MSIL compilation and JIT-compilation when you load and run some XSLT stylesheet with the XslCompiledTransform engine for the first time. Since the .NET Framework 2.0 did not allow you to save compiled stylesheets, you had to pay the compilation price on each application run.

XSLT Compiler Utility

The good news is we are providing the XSLT Compiler command-line utility xsltc.exe (announced here) that can be used to compile multiple stylesheets into one assembly. The changes to the System.Xml assembly required for this utility to work are shipped with .NET Framework 2.0 Service Pack 1, and the utility itself is shipped with Windows SDK 6.0, which absorbs .NET Framework SDK. Both these components will be installed by Visual Studio 2008. Below is the usage screen of xsltc.exe:

 C:\>xsltc.exe /?
Microsoft (R) XSLT Compiler version 3.5
[Microsoft (R) .NET Framework version 2.0.50727]
Copyright (C) Microsoft Corporation. All rights reserved.

xsltc [options] [/class:<name>] <source file> [[/class:<name>] <source file>...]

                      XSLT Compiler Options

                        - OUTPUT FILES -
/out:<file>             Specify name of binary output file (default: name of the first file)
/platform:<string>      Limit which platforms this code can run on: x86, Itanium, x64, or anycpu,
                        which is the default

                        - CODE GENERATION -
/class:<name>           Specify name of the class for compiled stylesheet (short form: /c)
/debug[+|-]             Emit debugging information
/settings:<list>        Specify security settings in the format (dtd|document|script)[+|-],...
                        Dtd enables DTDs in stylesheets, document enables document() function,
                        script enables <msxsl:script> element

                        - MISCELLANEOUS -
@<file>                 Insert command-line settings from a text file
/help                   Display this usage message (short form: /?)
/nologo                 Suppress compiler copyright message

The most useful options are /class and /out. If you have not specified the class name for some stylesheeet, it is defaulted to the name of the file containing that stylesheet, omitting the extension. The /debug option disables practically all optimizations (beware of performance degradation!) and creates a PDB file for the output assembly, which allow debugging stylesheets with a debugger. For security reasons, DTDs in stylesheets, the document XSLT function, and msxsl:script elements are disabled by default; you have to explicitly enable them using the /settings option if required. Each stylesheet is compiled into an abstract class, which can be loaded later by a new XslCompiledTransform.Load overload:

 public void Load(Type compiledStylesheet);

Compiling stylesheets into an assembly both simplifies the deployment (you don't have to deploy multiple stylesheet files) and eliminates XSLT-to-MSIL compilation time. Moreover, you may also eliminate JIT-compilation time by installing the resulting assembly in the native image cache.

How to Use It

Let us take, for example, a couple of the DocBook stylesheets, which had the worst JIT-compilation time in my previous experiment, and compile them:

 C:\docbook-xsl-1.72.0>xsltc /settings:dtd+,document+ /class:DocBookToHtml html\docbook.xsl /class:DocBookToFO fo\docbook.xsl

If you run the ILDASM tool on the resulting docbook.dll assembly, you will see two classes, DocBookToFO and DocBookToHtml generated for the stylesheets specified on the command line along with two helper $ArrayType$... classes used internally to initialize XSLT engine runtime tables:

Assembly with compiled DocBook stylesheets

To use compiled stylesheets from your favorite .NET language, you need to add a reference to docbook.dll to your project, and pass the desired class to the XslCompiledTransform.Load method. After that you may call Transform methods on the loaded XslCompiledTransform object the usual way:

 XslCompiledTransform stylesheet = new XslCompiledTransform();
stylesheet.Load(typeof(DocBookToHtml));
stylesheet.Transform("input.xml", "output.html");

To improve startup time you may choose to "pre-JIT" the assembly, installing a native image for it in the native image cache. However, before that you probably want to change the preferred base address of the assembly to avoid rebasing (I recommend reading Improving Application Startup Time and NGen Revs Up Your Performance with Powerful New Features articles). The xsltc.exe utility does not support the /baseaddress option, but you may use either rebase.exe or editbin.exe tool, both of which come with Visual Studio®:

 C:\docbook-xsl-1.72.0>editbin.exe /rebase:base=0x60000000 docbook.dll /nologo

C:\docbook-xsl-1.72.0>ngen install docbook.dll /nologo
Installing assembly C:\docbook-xsl-1.72.0\docbook.dll
Compiling 1 assembly:
    Compiling assembly C:\docbook-xsl-1.72.0\docbook.dll ...
docbook, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null

You may ask why we decided to compile stylesheets to abstract classes instead of implementing some common interface similar to IXmlTransform from Mvp.Xml project. There were two main reasons. First, System.Xml is a "red" assembly, and changes in the red bits have been greatly limited in Orcas. We tried to make public API changes as minimal as possible. Second, implementing XSLT 2.0 in the next release of the .NET Framework will probably require us to change the interface anyway.

Script Assemblies

If the stylesheet contains msxsl:script elements, their content is compiled to one or more separate assemblies using the CodeDOM technology. Since the CodeDOM does not allow having code snippets in different languages in a single assembly, one script assembly per script language is created. Suppose, for example, that the stylesheet MyTransform.xsl contains C# and Visual Basic .NET script blocks. When you compile it, three assemblies will be created: MyTransform.dll, containing compiled XSLT code, MyTransform.Script.cs.dll, containing compiled C# script blocks, and MyTransform.Script.vb.dll, containing compiled Visual Basic .NET script blocks. You may merge script assemblies with the XSLT assembly using the ILMerge utility:

 C:\MyTransform>ILMerge /out:MyTransform.dll MyTransform.dll MyTransform.Script.cs.dll MyTransform.Script.vb.dll

Limitations

Currently xsltc.exe does not allow to embed XML files as resources. Why might you need that? Suppose that the stylesheet C:\MyTransform\MyTransform.xsl contains relative document references document('') and document('config.xml'). If you compile it and deploy to another machine, it will try to read C:\MyTransform\MyTransform.xsl and C:\MyTransform\config.xml file respectively, which will result in an error unless you deploy MyTransform.xsl and config.xml in the same folder as on the build machine. You may think that relative document references should be resolved relative to the location of the compiled XSLT assembly, or that all documents referenced with relative URIs should be embedded in the assembly, but there are always cases when you need a different behavior. Fortunately, this problem may be resolved by modifying xsltc.exe to use a custom XmlResolver; I may write on this later.

Another limitation is that while XslCompiledTransform compiles a stylesheet to a set of unloadable DynamicMethods, an assembly generated by xsltc.exe cannot be unloaded until you shut down all AppDomains that used it (an infamous CLR limitation). This should not be a problem if you have a small set of fixed stylesheets, but becomes a real issue in server scenarios when thousand of stylesheets are generated dynamically based on user settings and customizations. We are actively investigating possible solutions for server scenarios, which do not require complicated AppDomain manipulations.

Under the Hood

Under the hood, xsltc.exe is a wrapper around the new XslCompiledTransform.CompileToType static method. You don't need to know about it unless you are developing your own version of the XSLT compiler. We expect that very few people will ever need to call this low-level method directly, as most will use xsltc.exe and optionally do some post-processing with other command-line utilities. However, for the sake of completeness, here is its brief description. (WARNING: The signature of the CompileToType method in beta releases of .NET Framework 2.0 SP1 may differ from the one given below.)

 // Compiles an XSLT stylesheet to a System.Type
public static CompilerErrorCollection CompileToType(
    XmlReader stylesheet,
    XsltSettings settings,
    XmlResolver stylesheetResolver,
    bool debug,
    TypeBuilder typeBuilder,
    string scriptAssemblyPath);

Parameters...

stylesheet

The XmlReader positioned on the beginning of the stylesheet.

settings

The XsltSettings to apply to the stylesheet. If this is null, the XsltSettings.Default settings are applied.

stylesheetResolver

The XmlResolver used to resolve any stylesheet modules referenced in xsl:import and xsl:include elements. If this is null, external resources are not resolved.

debug

true to compile in debug mode; otherwise false. Setting this to true enables debugging the stylesheet with a debugger.

typeBuilder

The TypeBuilder to use for the stylesheet compilation.

scriptAssemblyPath

The base path for the assemblies generated for msxsl:script elements. If only one script assembly is generated, this parameter specifies the path for that assembly. In case of multiple script assemblies, a distinctive suffix will be appended to the file name to ensure uniqueness of assembly names.

Return Value

A CompilerErrorCollection object containing compiler errors and warnings that indicates the results of the compilation.

Note that the first three parameters are the same as in XslCompiledTransform.Load method. The xsltc.exe utility creates an AssemblyBuilder and a MethodBuilder, then for each stylesheet specified on the command line creates a TypeBuilder, and compiles the stylesheet into it using the CompileToType method. Compiler errors and warning returned from the CompileToType method are output to the console. If all stylesheets have been compiled successfully, the dynamic assembly is saved to disk. If you are new to Reflection.Emit, you may find this dynamic assembly sample code useful.

Conclusion

The xsltc.exe utility allows you to precompile XSLT stylesheets so that your application will not incur the performance penalty of XSLT-to-MSIL and JIT-compilation on the first stylesheet execution. It also makes deployment of complex XSLT solutions, consisting of dozens of files, less cumbersome and protects your source XSLT code. Multiple stylesheets may be compiled into a single assembly, and the resulting assembly may be merged with the main DLL or EXE file of your application using the ILMerge utility.