Four Myths about SQL Server’s CLR Engine

1. There is an inherent overhead in calling CLR functions

Not so. Try creating a T-SQL function and a CLR function that both do absolutely nothing other than return the value 0. When called in a loop, you will find that the CLR function executes substantially faster than the T-SQL one.

The exception to this is if the query optimizer is able to resolve the T-SQL function so that the function’s SQL is merged with that of the calling query (To enable this you must ensure that the T-SQL function itself consists of a single SQL statement). The official story is that database access will be faster will T-SQL functions, but if you T-SQL function opens a cursor to do its database access, even that may not be true.

2. Execution of CLR code requires a CPU context switch

I have heard this “rumor” from very senior people. The truth is that code that is marked “UNSAFE” does require a CPU context switch, as this will be preemptively scheduled by the OS (this is also true of extended stored procedures). Code that is marked “SAFE” or “EXTERNAL_ACCESS” will be scheduled by SQL Server and executed in user mode. Consequently, there is likely to be a vast performance difference between “safe” and “unsafe” code.

Because SQL Server hosts the .NET runtime, it is generally able to transparently cause the SAFE-mode CLR code to yield at reasonable intervals (specifically, while waiting for locks or IO). However, if your CLR code does not routinely invoke these operations, well-behaved CLR code will invoke Thread.Sleep(0)occasionally. This explicit yielding will cause negligible performance impact and it will prevent SQL Server’s schedulers from getting rough with your code.

3. Enabling the CLR on the server enables yet another service to slow the machine

The flag that enables the CLR in SQL Server 2005 or 2008 will only allow or disallow user assemblies to be loaded into SQL Server. The CLR itself is required by SQL Server and it is always running. To prove this, notice that SQL Server’s spatial types work correctly even when the CLR is not enabled on the server — and these are all CLR types.

4. There’s no such thing as a table valued aggregate.

There are multiple scenarios where it would be nice to get a set of values from a single data scan. An example of this could be when you want to return a list of the top N values, or perhaps the minimum set of “baskets” that contain all the values in the search column.

You still cannot have a user-defined aggregate return a table-valued result, but you can have a user-defined aggregation return a binary result, which can then be passed to a user-defined function which returns a table-valued result. Sure it’s kludgy, requiring two functions and an intermediate result, but it works. The biggest limitation is the fact that the resulting data cannot exceed 8000 bytes.

I’ll try to post an example soon.