Use Parameters or Literals for Query Inputs

Use unmodified parameters or literal constants in query statements to ensure that the optimizer can determine a representative value and optimize accordingly. If you use a local variable in a query predicate instead of a parameter or literal, the optimizer is unable to determine the value for the input. Likewise, if a parameter value is modified before being used in a statement, then the query plan will be chosen for the original value of the parameter and may not be the best plan for the actual value of the parameter when the statement is executed. Not having the proper value, or any value at all, leads to a less accurate cardinality estimates. Poor estimates can in turn lead to a poorly performing query plan. Use unmodified parameters or literals in the query instead of local variables, and the optimizer typically will be able to pick a better query plan. For example, consider this query that uses a local variable:

declare @StartOrderDate datetime

set @StartOrderDate = '20040731'

select * from Sales.SalesOrderHeader h, Sales.SalesOrderDetail d

WHERE h.SalesOrderID = d.SalesOrderId

AND h.OrderDate >= @StartOrderDate

The number of rows from Sales.SalesOrderHeader that the optimizer estimates will qualify vs. the condition h.OrderDate >= @StartOrderDate is 9439.5, which is exactly 30% of the size of the table. You can use the graphical showplan for the query and right-click the plan node for Sales.SalesOrderHeader to display this cardinality estimate. In a pre-release version of SQL Server 2005 used while preparing this paper, the plan chosen uses a merge join (the observations that follow are based on this same SQL Server 2005 version; your results may differ depending on your SQL Server version, available memory, etc.). Now, consider this equivalent query that doesn't use a local variable:

SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d

WHERE h.SalesOrderID = d.SalesOrderId

AND h.OrderDate >= '20040731'

The cardinality of the result set for the predicate "h.OrderDate >= '20040731'" is estimated as 40 in the graphical showplan for the query (right-click the filter operator), for a selectivity of 0.13%. The plan chosen for this query uses a nested loop join instead of a merge join because of this improved estimate.

Even when local variables are used in a query, an estimate that is better than a guess is used in the case of equality predicates. Selectivity for conditions of the form "@local_variable = column_name" is estimated using the average value frequency from the histogram for column_name. So, for example, if the column column_name contains all unique values, then a selectivity estimate of 1/(number of unique values in column) will be used, which is accurate.

To eliminate the use of local variables, consider (1) rewriting the query to use literals instead of variables, (2) using sp_executesql with parameters that replace your use of local variables, or (3) using a stored procedure with parameters that replace your use of local variables. Dynamic SQL via EXEC may also be useful for eliminating local variables, but it typically will result in higher compilation overhead.

The query optimizer chooses the best plan for the value of the parameter that was input to the stored procedure or call to sp_executesql. For this reason, it is best to avoid writing SQL where the value used at execution is different from the value input to the stored procedure. Instead it is better to modify the parameter value in one stored procedure and then pass it to another stored procedure which contains the SQL statement where it is used. For example, consider this query that modifies a parameter value:

CREATE PROCEDURE GetRecentSales (@date datetime) WITH RECOMPILE AS

BEGIN

      IF @date IS NULL

            SET @date = dateadd("mm",-3,(SELECT MAX(OrderDATE)

                                          FROM Sales.SalesOrderHeader))

      SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d

      WHERE h.SalesOrderID = d.SalesOrderID

      AND h.OrderDate > @date

END

This SP, if called with NULL, will have the final SELECT statement optimized for @date = NULL. Since no rows have NULL OrderDate, the cardinality estimate for the result of applying this filter to SalesOrderHeader is very low (1 row). However, at run time, the date is not NULL, but three months before the latest OrderDate. The actual number of SalesOrderHeader rows that qualify is 5,736. The optimizer chooses a nested loop join for the query when NULL is passed to GetRecentSales, whereas the optimal plan contains a merge join. You can see the plan selected, and the expected and actual cardinalities, using this script:

SET STATISTICS PROFILE ON

GO

EXEC GetRecentSales NULL

GO

SET STATISTICS PROFILE OFF

GO

The WITH RECOMPILE option specified on the GetRecentSales stored procedure above does not eliminate the cardinality estimation error. One way to ensure that the queries in this example are optimized with appropriate parameter values that allow good estimates to be obtained is to modify the stored procedure as follows, breaking it down into parts:

CREATE PROCEDURE GetRecentSales (@date datetime) AS

BEGIN

      IF @date IS NULL

            SET @date = dateadd("mm",-3,(SELECT MAX(OrderDATE) FROM Sales.SalesOrderHeader))

      EXEC GetRecentSalesHelper @date

END

CREATE PROCEDURE GetRecentSalesHelper (@date datetime) WITH RECOMPILE AS

BEGIN

      SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d

      WHERE h.SalesOrderID = d.SalesOrderID

      AND h.OrderDate > @date -- @date is unchanged from compile time,

                              -- so a good plan is obtained.

END

Comments