Using Computed Columns to Fix Scalar Expression Estimation Errors

(Ok, let's try something a bit more involved now. Here's a tip on how you can use computed columns to improve query plan quality in SQL Server 2005. The optimizer supports building statistics on the results of expressions, and this can avoid debugging query plans later if you make sure that the system is doing this for complex expressions, user-defined expressions, or other places where the cardinality estimate isn't matching what is actually happening at runtime. Enjoy! Conor Cunningham):

 

Computed columns can be used to correct some cardinality estimation errors. If the optimizer is incorrectly estimating a predicate (as examples, due to correlations or because the optimizer is guessing on the selectivity of a user-defined function), a computed column can give the optimizer a hook on which statistics can be stored. Note that computed columns in SQL 2005 can be created without increasing the on-disk row width if the scalar expression meets specific requirements documented in Books Online (for example, it must be deterministic).

The following example demonstrates how computed columns can be used to affect cardinality estimates. A few notes for the example:

  1. The plan is very simple in this example. The cardinality estimate error does not impact the plan choice. Please understand that this problem could negatively impact join order/algorithm in larger queries.
  2. User-defined functions need to be created WITH SCHEMABINDING or else the optimizer does not trust them at all (they could be changed without causing the plan to recompile, which is “not good”).

use tempdb

-- create a table with 20000 rows in it

create table t1 (col1 int, col2 int)

declare @i int

set @i = 0

while @i < 20000

begin

insert into t1(col1, col2) values (@i, rand()*1000)

set @i = @i + 1

end

-- drop function dbo.foo

-- create a function. The optimizer does not understand it, so

-- it will cause guessing logic in plan generation.

create function foo(@a int) returns int

with schemabinding

as

BEGIN

set @a = 5;

return 155

END

-- clean out the plan cache

dbcc freeproccache

set statistics profile on

-- we under-guess on the UDF expression (assumes about 8% selectivity, actual is 100%)

select * from t1

where dbo.foo(t1.col1) = 155

(Abbreviated statistics profile output)

Rows Executes StmtText

-------------------- -------------------- -----------------------------------------------

20000 1 select * from t1 where dbo.foo(t1.col1) = 155

20000                1 |--Filter(WHERE:( foo(col1)=155))

20000 1 |--Table Scan(OBJECT:(t1))

EstimateRows

------------------

1681.7928

1681.7928

20000.0

-- we over-guess on the UDF expression (assumes about 8% selectivity, actual is 0%)

dbcc freeproccache

select * from t1

where dbo.foo(t1.col1) = 154

Rows Executes StmtText

-------------------- -------------------- -----------------------------------------------

0 1 select * from t1 where dbo.foo(t1.col1) = 154

0                    1 |--Filter(WHERE:(foo( col1)=154))

20000 1 |--Table Scan(OBJECT:(t1))

(estimates match previous example)

set statistics profile off

-- add (non-persisted) computed column over expression.

-- Note that the function has WITHSCHEMABINDING enabled (which causes the optimizer to trust the expression will not change over recompiles)

-- Note2 - this could be persisted if the function were expensive. Non-deterministic expressions are not persistable, though imprecise expressions are.

alter table t1 add c3 as dbo.foo(col1)

set statistics profile on

-- exact same queries

-- we correctly estimate 100% selectivity for this query

dbcc freeproccache

select * from t1

where dbo.foo(t1.col1) = 155

Rows Executes StmtText

-------------------- -------------------- -----------------------------------------------

20000 1 select * from t1 where dbo.foo(t1.col1) = 155

0 0 |--Compute Scalar(DEFINE:( c3=c3))

20000                1 |--Filter(WHERE:( c3=(155)))

0 0 |--CompSca(DEFINE:( c3=foo(col1)))

20000 1 |--Table Scan(t1))

EstimateRows

------------------------

20000.0

20000.0

20000.0

20000.0

20000.0

-- we estimate 0% selectivity for this query (actually we have a floor to estimate at least one row)

dbcc freeproccache

select * from t1

where dbo.foo(t1.col1) = 154

Rows Executes StmtText

-------------------- -------------------- -----------------------------------------------

0 1 select * from t1 where dbo.foo(t1.col1) = 154

0 0 |--Compute Scalar(DEFINE:( c3=c3))

0                1 |--Filter(WHERE:( c3=(154)))

0 0 |--CompSca(DEFINE:( c3=foo(col1)))

20000 1 |--Table Scan(t1))

EstimateRows

------------------------

1.0

1.0

1.0

20000.0

20000.0

set statistics profile off