Implied Predicates and Query Hints

In this post, I want to take a look at how two seemingly unrelated features of SQL Server can interact to cause a problem.  The idea for this post came from a question submitted by a reader.  Let's begin.  Consider the following trivial schema and query:

CREATE TABLE T1 (A INT, B INT)
CREATE TABLE T2 (A INT, B INT)

SELECT *
FROM T1 INNER JOIN T2 ON T1.A = T2.A
WHERE T1.B = 0
OPTION (HASH JOIN)

Not surprisingly, this query yields the following plan:

  |--Hash Match(Inner Join, HASH:([T1].[A])=([T2].[A]), RESIDUAL:([T2].[A]=[T1].[A]))
       |--Table Scan(OBJECT:([T1]), WHERE:([T1].[B]=(0)))
       |--Table Scan(OBJECT:([T2]))

In fact, this query yields this plan with or without the hint.  Now let's make a small modification to the WHERE clause of the query see what happens:

SELECT *
FROM T1 INNER JOIN T2 ON T1.A = T2.A
WHERE T1.A = 0
OPTION (HASH JOIN)

Now this query yields the following error message:

Msg 8622, Level 16, State 1, Line 1

Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

What happened?  Why does this seemingly innocuous change to the query cause it to fail?  To find the answer, let's run the query without the HASH JOIN hint and look at the plan:

SELECT *
FROM T1 INNER JOIN T2 ON T1.A = T2.A
WHERE T1.A = 0

  |--Nested Loops(Inner Join)
       |--Table Scan(OBJECT:([T2]), WHERE:([T2].[A]=(0)) )
       |--Table Scan(OBJECT:([T1]), WHERE:([T1].[A]=(0)))

There are two things about this plan that are especially notable:

First, the plan includes the predicate "T2.A = 0".  Although we did not specify this predicate in the original query, SQL Server derives it from the predicates that we did specify.  This derivation is a good thing.  It allows SQL Server to filter out rows from the scan of T2 earlier than would otherwise be possible.

Second, the original equijoin predicate "T1.A = T2.A" appears nowhere in this plan.  This predicate is redundant with the original predicate "T1.A = 0" and the derived predicate "T2.A = 0" so SQL Server eliminates it.  Normally, eliminating a redundant predicate is a good thing.  By the time the rows from the scans reach the join, there is no reason to evaluate this predicate.  It would be a waste of time.  Unfortunately, in this case, the eliminated predicate also happens to be the only equijoin predicate and hash joins (and merge joins) require at least one equijoin predicate.  Thus, the query with the HASH JOIN hint fails.

Note that the loss of the hash join and merge join alternatives for the above plan is not a big deal from a performance perspective.  With or without the join predicate, the query is a cross join since all rows from T1 will join with all rows from T2.

If we throw in a third, briefly lived feature of SQL Server 2008, the situation gets even more complex.  SQL Server 2008 RTM has an optimization that substitutes constants for parameter and variable values if we use the RECOMPILE hint.  Note that this optimization was removed from SQL Server 2008 Cummulative Update 4 and Service Pack 1 to fix an issue.  If your instance of SQL Server has the fix, you will not be able to reproduce the following scenario.

Let's see a simple example of this optimization in action.  Compare the plans for the following two identical queries:

DECLARE @P INT
SET @P = 0

SELECT * FROM T1 WHERE T1.A = @P

  |--Table Scan(OBJECT:([T1]), WHERE:([T1].[A]=[@P]) )

SELECT * FROM T1 WHERE T1.A = @P
OPTION (RECOMPILE)

  |--Table Scan(OBJECT:([T1]), WHERE:([T1].[A]=(0)) )

Observe how the WHERE clause in the first plan references the @P variable while the WHERE clause in the second plan which includes the RECOMPILE hint references the constant 0.  This optimization is safe since the plan generated for the query with the RECOMPILE hint will be used only once and then discarded rather than cached.

The preceding plans are "actual" rather than "estimated" query plans.  That is, I collected them using SET STATISTICS PROFILE ON rather than SET SHOWPLAN_TEXT ON.  The parameter substitution optimization can only be applied when the query is recompiled immediately prior to execution and the actual parameter values are known.

Now let's create a simple stored procedure:

CREATE PROCEDURE MY_SP (@P1 INT, @P2 INT)
AS
SELECT *
FROM T1 INNER JOIN T2 ON T1.A = T2.A
WHERE T1.A BETWEEN @P1 AND @P2
OPTION (HASH JOIN, RECOMPILE)

If we run this stored procedure with two different parameters, it works just fine:

EXEC MY_SP 0, 1

Here is the query plan:

  |--Hash Match(Inner Join, HASH:([T1].[A])=([T2].[A]), RESIDUAL:([T2].[A]=[T1].[A]))
       |--Table Scan(OBJECT:([T1]), WHERE:([T1].[A]>=(0) AND [T1].[A]<=(1)))
       |--Table Scan(OBJECT:([T2]), WHERE:([T2].[A]>=(0) AND [T2].[A]<=(1)))

However, if we run the stored procedure with the same parameter, it fails with the same error that we saw at the beginning of this post.  For example, the following fails:

EXEC MY_SP 0, 0

Msg 8622, Level 16, State 1, Line 1

Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

Why does the stored procedure work with some parameters but not with others?  If we alter the stored to remove the HASH JOIN hint, we can see what has happened:

ALTER PROCEDURE MY_SP (@P1 INT, @P2 INT)
AS
SELECT *
FROM T1 INNER JOIN T2 ON T1.A = T2.A
WHERE T1.A BETWEEN @P1 AND @P2
OPTION (RECOMPILE)

EXEC MY_SP 0, 0

  |--Nested Loops(Inner Join)
       |--Table Scan(OBJECT:([T2]), WHERE:([T2].[A]=(0)))
       |--Table Scan(OBJECT:([T1]), WHERE:([T1].[A]=(0)))

SQL Server has transformed the predicate "T1.A BETWEEN @P1 AND @P2" first into "T1.A BETWEEN 0 AND 0" and then into "T1.A = 0".  Now we have exactly the same scenario that we had in the first failed query above.

Query hints can be powerful tools, but they can also backfire - often in mysterious ways.  In these examples, two and even three separate features all interact to induce the error.  Even a small change to a query or to a stored procedure's parameters can be the difference between success and failure.

Comments

  • Anonymous
    May 01, 2009
    Big thanks for your essential posts! It's very useful information for SQL developers. I set label "IMPORTANT" for each your article in my Outlook :))

  • Anonymous
    July 02, 2010
    I agree that understanding this is important. You say that "Even a small change to a query or to a stored procedure's parameters can be the difference between success and failure."   To me, this would really indicate that the compiler is actually rather unstable.  Let's say that the word "robust" does not come to mind. Nevertheless, thank you for your observations.

  • Anonymous
    July 07, 2010
    The comment has been removed

  • Anonymous
    July 09, 2010
    The comment has been removed

  • Anonymous
    July 09, 2010
    The comment has been removed

  • Anonymous
    November 29, 2011
    The comment has been removed

  • Anonymous
    April 06, 2012
    The comment has been removed

  • Anonymous
    April 10, 2012
    The comment has been removed

  • Anonymous
    December 31, 2012
    The comment has been removed