Recursive CTEs continued ...
In this post, I will finish the discussion of recursive CTEs that I began in my last post. I will continue to use the CTE examples from Books Online. To run these examples, you'll need to install the Adventure Works Cycles OLTP sample database.
In my last post, I explained that all recursive queries follow the same pattern of one or more anchor sub-selects and one or more recursive sub-selects combined by a UNION ALL. Similarly, all recursive query plans also follow the same pattern which looks like so:
|--Index Spool(WITH STACK)
|--Concatenation
|--Compute Scalar(DEFINE:([Expr10XX]=(0)))
| |-- ... anchor sub-select plan(s) ...
|--Assert(WHERE:(CASE WHEN [Expr10ZZ]>(100) THEN (0) ELSE NULL END))
|--Nested Loops(Inner Join, OUTER REFERENCES:([Expr10YY], [Recr10XX], ...))
|--Compute Scalar(DEFINE:([Expr10ZZ]=[Expr10YY]+(1)))
| |--Table Spool(WITH STACK)
|-- ... recursive sub-select plan(s) ...
Because of this basic plan shape, SQL Server cannot execute recursive queries using parallel plans. There are two basic problems. First, the stack spool does not support parallel execution. Second, the nested loops join does not support parallel execution on its inner input.
Finally, let's look at how the placement of a WHERE clause can have a big impact on the performance of a recursive query. In my last post, I dissected a recursive query that returns a list of all employees and their level within the organization. Suppose that we want to run the same query but limit the results to those employees in the first two levels. We could write the following query (which you'll also find in Books Online) with an extra WHERE clause to limit the results:
WITH DirectReports(ManagerID, EmployeeID, EmployeeLevel) AS
(
SELECT ManagerID, EmployeeID, 0 AS EmployeeLevel
FROM HumanResources.Employee
WHERE ManagerID IS NULL
UNION ALL
SELECT e.ManagerID, e.EmployeeID, EmployeeLevel + 1
FROM HumanResources.Employee e
INNER JOIN DirectReports d
ON e.ManagerID = d.EmployeeID
)
SELECT ManagerID, EmployeeID, EmployeeLevel
FROM DirectReports
WHERE EmployeeLevel <= 2
This query computes uses essentially the same plan as the query without the extra WHERE clause. The WHERE clause simply introduces a filter operator at the root of the plan:
Rows Executes
34 1 |--Filter(WHERE:([Recr1012]<=(2)))
290 1 |--Index Spool(WITH STACK)
290 1 |--Concatenation
0 0 |--Compute Scalar(DEFINE:([Expr1013]=(0)))
0 0 | |--Compute Scalar(DEFINE:([Expr1003]=(0)))
1 1 | |--Index Seek(OBJECT:([Employee].[IX_Employee_ManagerID]), SEEK:([ManagerID]=NULL) ...)
289 1 |--Assert(WHERE:(CASE WHEN [Expr1015]>(100) THEN (0) ELSE NULL END))
289 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1015], [Recr1006], [Recr1007], [Recr1008]))
0 0 |--Compute Scalar(DEFINE:([Expr1015]=[Expr1014]+(1)))
290 1 | |--Table Spool(WITH STACK)
0 0 |--Compute Scalar(DEFINE:([Expr1009]=[Recr1008]+(1)))
289 290 |--Index Seek(OBJECT:([Employee].[IX_Employee_ManagerID]), SEEK:([ManagerID]=[Recr1007]) ...)
As you can see from the row counts, this query computes the level of every employee in the organization before discarding the results that we do not want. Now, suppose that we instead move the WHERE clause into the recursive sub-select:
WITH DirectReports(ManagerID, EmployeeID, EmployeeLevel) AS
(
SELECT ManagerID, EmployeeID, 0 AS EmployeeLevel
FROM HumanResources.Employee
WHERE ManagerID IS NULL
UNION ALL
SELECT e.ManagerID, e.EmployeeID, EmployeeLevel + 1
FROM HumanResources.Employee e
INNER JOIN DirectReports d
ON e.ManagerID = d.EmployeeID
WHERE EmployeeLevel < 2
)
SELECT ManagerID, EmployeeID, EmployeeLevel
FROM DirectReports
This query is semantically identical to the previous query, but if we look at the plan, we can see that the filter moves into the recursive portion of the plan:
Rows Executes
34 1 |--Index Spool(WITH STACK)
34 1 |--Concatenation
0 0 |--Compute Scalar(DEFINE:([Expr1013]=(0)))
0 0 | |--Compute Scalar(DEFINE:([Expr1003]=(0)))
1 1 | |--Index Seek(OBJECT:([Employee].[IX_Employee_ManagerID]), SEEK:([ManagerID]=NULL) ...)
33 1 |--Assert(WHERE:(CASE WHEN [Expr1015]>(100) THEN (0) ELSE NULL END))
33 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1015], [Recr1006], [Recr1007], [Recr1008]))
0 0 |--Compute Scalar(DEFINE:([Expr1015]=[Expr1014]+(1)))
34 1 | |--Table Spool(WITH STACK)
0 0 |--Compute Scalar(DEFINE:([Expr1009]=[Recr1008]+(1)))
33 34 |--Nested Loops(Inner Join)
7 34 |--Filter(WHERE:(STARTUP EXPR([Recr1008]<(2))))
7 7 | |--Constant Scan
33 7 |--Index Seek(OBJECT:([Employee].[IX_Employee_ManagerID]), SEEK:([ManagerID]=[Recr1007]) ...)
Notice from the row counts that the new plan computes only the portion of the result set that really interests us and then terminates the recursion.
You may also observe than the plan now has a startup filter. An ordinary filter reads rows from its input tree and then evaluates a Boolean expression on each row to determine whether to return it. A startup filter evaluates its expression only once per execution. The expression does not depend on the input rows. If the expression evaluates to true, the startup filter returns all of its input rows. If the expression evaluates to false, the startup filter returns no rows.
In this example, if the startup filter expression evaluates to true, the constant scan returns one row to the nested loops join immediately above the filter and this join then executes the index seek on its inner input. However, if the startup filter expression evaluates to false, the filter and then the nested loops join return no rows.
While there are many scenarios where a startup filter is genuinely useful, this plan is slightly more complex than is really necessary. The plan could have used an ordinary filter:
|--Index Spool(WITH STACK)
|--Concatenation
|--Compute Scalar(DEFINE:([Expr1013]=(0)))
| |--Compute Scalar(DEFINE:([Expr1003]=(0)))
| |--Index Seek(OBJECT:([HumanResources].[Employee].[IX_Employee_ManagerID]), SEEK:([HumanResources].[Employee].[ManagerID]=NULL) ORDERED FORWARD)
|--Assert(WHERE:(CASE WHEN [Expr1015]>(100) THEN (0) ELSE NULL END))
|--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1015], [Recr1006], [Recr1007], [Recr1008]))
|--Filter(WHERE:([Recr1008]<(2)))
| |--Compute Scalar(DEFINE:([Expr1015]=[Expr1014]+(1)))
| |--Table Spool(WITH STACK)
|--Compute Scalar(DEFINE:([Expr1009]=[Recr1008]+(1)))
|--Index Seek(OBJECT:([HumanResources].[Employee].[IX_Employee_ManagerID] AS [e]), SEEK:([e].[ManagerID]=[Recr1007]) ORDERED FORWARD)
Note that this plan is artificial. You cannot get SQL Server to generate it.
Comments
Anonymous
August 11, 2010
You've discussed how adding the filter inside the recursive portion of the plan, but what about getting a filter on the CTE to apply to the recursive temporary results rather than the completed results? Is there a method to make a CTE expand essentially to evaluate the criteria on the CTE as part of the recursive operation (thereby limiting the recursion)?Anonymous
August 17, 2010
SQL Server will push filters down to the anchor sub-select in certain very limited scenarios. For example: with cte(a, b) as ( select a, b from t union all select cte.a, t.b from t join cte on t.a = cte.a ) select * from cte where a = 0 The filter "a = 0" will push down. Unfortunately, if you change the recursive sub-select to return t.a instead of cte.a, the filter will not pushdown even though the columns are equivalent.Anonymous
January 18, 2012
Can I use this "WITH statement" inside a subquery?Anonymous
January 20, 2012
Hi Deb, The WITH statement can only be used at the start of a statement and cannot be embedded within a subquery. However, you can use the WITH statement to declare multiple comma separated CTEs and you can reference a CTE within a subquery. See the following pages for more information: msdn.microsoft.com/.../ms190766(v=sql.105).aspx msdn.microsoft.com/.../ms175972(v=sql.110).aspx HTH CraigAnonymous
March 18, 2012
Hi, I need also the posibility to sort the result on each level. In Oracle the statement is called "order siblings by". I have a tree similar to the directory tree of the Microsoft Explorer and I have to sort each directory on each level UweAnonymous
March 20, 2012
The comment has been removedAnonymous
April 01, 2012
i'm trying to translate this oracle query that, given an employee (i.e JONES, using the oracle scott.emp table) will return his ancestors (KING) and his descendants (FORD, SCOTT and their descendants) and provide the level for each WITH union_results AS ( SELECT LEVEL AS lvl , ename , 'Bottom-Up' AS direction FROM scott.emp START WITH ename = 'JONES' CONNECT BY NOCYCLE empno = PRIOR mgr UNION SELECT LEVEL AS lvl , ename , 'Top-Down' AS direction FROM scott.emp START WITH ename = 'JONES' CONNECT BY NOCYCLE mgr = PRIOR empno ) SELECT DISTINCT (SELECT MAX (lvl) FROM union_results WHERE direction = 'Bottom-Up') + ( CASE WHEN direction = 'Bottom-Up' THEN 1 - lvl ELSE lvl - 1 END ) AS rev_level , ename FROM union_results ORDER BY rev_level; REV_LEVEL ENAME ---------- ---------- 1 KING 2 JONES 3 FORD 3 SCOTT 4 ADAMS 4 SMITHAnonymous
April 04, 2012
Hi Ion, I'm not familiar with the Oracle syntax, but perhaps the following would work? WITH reports (empno, ename, mgr, lvl) AS ( SELECT empno, ename, mgr, 0 as lvl FROM emp WHERE ename = 'JONES' UNION ALL SELECT e.empno, e.ename, e.mgr, lvl + 1 FROM emp e INNER JOIN reports r ON e.mgr = r.empno ), mgrs (empno, ename, mgr, lvl) AS ( SELECT empno, ename, mgr, 0 as lvl FROM emp WHERE ename = 'JONES' UNION ALL SELECT e.empno, e.ename, e.mgr, lvl - 1 FROM emp e INNER JOIN mgrs m ON e.empno = m.mgr ) SELECT lvl - (SELECT MIN(lvl) FROM mgrs) + 1 AS rev_level, ename FROM (SELECT lvl, ename FROM reports UNION SELECT lvl, ename FROM mgrs) u ORDER BY lvl go HTH Craig