Aggregation WITH CUBE
In my last post, I wrote about how aggregation WITH ROLLUP works. In this post, I will discuss how aggregation WITH CUBE works. Like the WITH ROLLUP clause, the WITH CUBE clause permits us to compute multiple "levels" of aggregation in a single statement. To understand the difference between these two clauses, let's look at an example. We'll use the same fictitious sales data from last week's example.
CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)
INSERT Sales VALUES(1, 2005, 12000)
INSERT Sales VALUES(1, 2006, 18000)
INSERT Sales VALUES(1, 2007, 25000)
INSERT Sales VALUES(2, 2005, 15000)
INSERT Sales VALUES(2, 2006, 6000)
INSERT Sales VALUES(3, 2006, 20000)
INSERT Sales VALUES(3, 2007, 24000)
Consider the following query from last week:
SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH ROLLUP
It will be easier to see what is happening if we pivot the sales data:
|
Yr | ||||
2005 |
2006 |
2007 |
ALL | ||
EmpId |
1 |
12000.00 |
18000.00 |
25000.00 |
55000.00 |
2 |
15000.00 |
6000.00 |
|
21000.00 | |
3 |
|
20000.00 |
24000.00 |
44000.00 | |
ALL |
|
|
|
120000.00 |
The table clearly shows that the WITH ROLLUP clause computes the total for each employee for all years and the grand total for all employees and all years. The query does not compute the totals for each year for all employees. Moreover, the order of the columns in the GROUP BY clause determines in which order the data is totaled.
Now let's repeat the same query but replace the WITH ROLLUP clause with a WITH CUBE clause:
SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH CUBE
This query computes all possible sub-totals and totals:
|
Yr | ||||
2005 |
2006 |
2007 |
ALL | ||
EmpId |
1 |
12000.00 |
18000.00 |
25000.00 |
55000.00 |
2 |
15000.00 |
6000.00 |
|
21000.00 | |
3 |
|
20000.00 |
24000.00 |
44000.00 | |
ALL |
27000.00 |
44000.00 |
49000.00 |
120000.00 |
Because the WITH CUBE clause causes the query to compute all possible totals, the order of the columns in the GROUP BY clause does not matter. Of course, by default, SQL Server does not pivot the results of either of the above queries. Here is the actual output from the WITH CUBE query:
EmpId Yr Sales
----------- ----------- ---------------------
1 2005 12000.00
1 2006 18000.00
1 2007 25000.00
1 NULL 55000.00
2 2005 15000.00
2 2006 6000.00
2 NULL 21000.00
3 2006 20000.00
3 2007 24000.00
3 NULL 44000.00
NULL NULL 120000.00
NULL 2005 27000.00
NULL 2006 44000.00
NULL 2007 49000.00
Next, let's look at the query plan for the WITH CUBE query:
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1005]=(0) THEN NULL ELSE [Expr1006] END))
|--Concatenation
|--Stream Aggregate(GROUP BY:([Sales].[EmpId], [Sales].[Yr]) DEFINE:([Expr1005]=SUM([Expr1007]), [Expr1006]=SUM([Expr1008])))
| |--Sort(ORDER BY:([Sales].[EmpId] ASC, [Sales].[Yr] ASC))
| |--Table Spool
| |--Stream Aggregate(GROUP BY:([Sales].[Yr], [Sales].[EmpId]) DEFINE:([Expr1007]=COUNT_BIG([Sales].[Sales]), [Expr1008]=SUM([Sales].[Sales])))
| |--Sort(ORDER BY:([Sales].[Yr] ASC, [Sales].[EmpId] ASC))
| |--Table Scan(OBJECT:([Sales]))
|--Compute Scalar(DEFINE:([Expr1012]=NULL))
|--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1005]=SUM([Expr1007]), [Expr1006]=SUM([Expr1008])))
|--Table Spool
This plan consists of two parts. SQL Server has effectively rewritten our query as follows:
SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH ROLLUP
UNION ALL
SELECT NULL, Yr, SUM(Sales)
FROM Sales
GROUP BY Yr
The first part of the plan computes the result for the WITH ROLLUP query above. I described how this query plan works in last week's post. The second part of this plan computes the missing year sub-totals yielding the entire CUBE result. Note that this plan employs a common sub-expression spool. As I discussed in this post, a common sub-expression spool copies its input rows into a worktable and then reads and returns the rows from the worktable multiple times - in this case twice. The spool is meant to improve performance although, in this example, it has little impact since the server could just as easily have re-read the original Sales table. However, if the input to the aggregation was more complex and cost more to evaluate, the spool would help.
If we use the WITH CUBE clause when aggregating on more than two columns, SQL Server simply generates increasingly complex plans with additional inputs to the concatentation operator. As with the simple two column example, the idea is to compute the whole CUBE by computing all of the individual ROLLUPs that compose it.
Finally, we can actually combine WITH CUBE and PIVOT to generate the above table in a single simple statement. (I actually proposed a variation of this query in an answer to a reader's comment on my post about the PIVOT operator but I like this solution better.)
SELECT EmpId, [2005], [2006], [2007], [ALL]
FROM
(
SELECT
CASE WHEN GROUPING(EmpId) = 0
THEN CAST (EmpId AS CHAR(7))
ELSE 'ALL'
END AS EmpId,
CASE WHEN GROUPING(Yr) = 0
THEN CAST (Yr AS CHAR(7))
ELSE 'ALL'
END AS Yr,
SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH CUBE
) AS s
PIVOT (SUM(Sales) FOR Yr IN ([2005], [2006], [2007], [ALL])) AS p
Here is the output from this query:
EmpId 2005 2006 2007 ALL
------- --------------------- --------------------- --------------------- ---------------------
1 12000.00 18000.00 25000.00 55000.00
2 15000.00 6000.00 NULL 21000.00
3 NULL 20000.00 24000.00 44000.00
ALL 27000.00 44000.00 49000.00 120000.00
Comments
Anonymous
October 11, 2007
In my last two posts, I gave examples of aggregation WITH ROLLUP and CUBE . SQL Server 2008 continuesAnonymous
October 25, 2007
One of the most important uses of CTEs is to write recursive queries. In fact, CTEs provide the onlyAnonymous
August 19, 2010
Nice post.Anonymous
November 24, 2010
please tell how to display pivot tables using graphAnonymous
November 29, 2010
You'll need to use a tool such as Excel or Reporting Services to display the results of a query as a graph. Both tools can link directly to the data stored in SQL Server. HTH, CraigAnonymous
April 19, 2011
Thanx a lot again....Anonymous
November 10, 2013
It's working fine. But i have adding another one column EmpName that showing problem?Anonymous
November 11, 2013
I'm not certain that I understand your question. You can certainly add other columns to the table and include them in the query. Remember that any column that you select that is not an argument to an aggregate function must be including in the group by list. For example: SELECT EmpId, EmpName, Yr, SUM(Sales) AS Sales FROM Sales GROUP BY EmpId, EmpName, Yr WITH CUBE HTH, CraigAnonymous
January 13, 2015
Hi Craig, very interesting and useful post and I have the following question: In your example the number or Years [Yr] is known and they are specified in the Select and the Pivot, in my case the number of years is uncertain, because in some cases it can be 10 in other 35 depending on the record set I extract. Is there a way to populate the Years [Yr] dynamically from the result set? RegardsAnonymous
January 14, 2015
The comment has been removedAnonymous
January 21, 2015
The comment has been removed