Aggregation WITH ROLLUP
In this post, I'm going to discuss how aggregation WITH ROLLUP works. The WITH ROLLUP clause permits us to execute multiple "levels" of aggregation in a single statement. For example, suppose we have the following fictitious sales data. (This is the same data that I used for my series of posts on the PIVOT operator.)
CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)
INSERT Sales VALUES(1, 2005, 12000)
INSERT Sales VALUES(1, 2006, 18000)
INSERT Sales VALUES(1, 2007, 25000)
INSERT Sales VALUES(2, 2005, 15000)
INSERT Sales VALUES(2, 2006, 6000)
INSERT Sales VALUES(3, 2006, 20000)
INSERT Sales VALUES(3, 2007, 24000)
We can write a simple aggregation query to compute the total sales by year:
SELECT Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr
As expected, this query returns three rows - one for each year:
Yr Sales
----------- ---------------------
2005 27000.00
2006 44000.00
2007 49000.00
The query plan is a simple stream aggregate:
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))
|--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1010]=COUNT_BIG([Sales].[Sales]), [Expr1011]=SUM([Sales].[Sales])))
|--Sort(ORDER BY:([Sales].[Yr] ASC))
|--Table Scan(OBJECT:([Sales]))
Now suppose that we want to compute not just the sale by year but the total sales as well. We could write a UNION ALL query:
SELECT Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr
UNION ALL
SELECT NULL, SUM(Sales) AS Sales
FROM Sales
This query works and does give the correct result:
Yr Sales
----------- ---------------------
2005 27000.00
2006 44000.00
2007 49000.00
NULL 120000.00
However, the query plan performs two scans and two aggregations (one to compute the sales by year and one to compute the total sales):
|--Concatenation
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1023]=(0) THEN NULL ELSE [Expr1024] END))
| |--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1023]=COUNT_BIG([Sales].[Sales]), [Expr1024]=SUM([Sales].[Sales])))
| |--Sort(ORDER BY:([Sales].[Yr] ASC))
| |--Table Scan(OBJECT:([Sales]))
|--Compute Scalar(DEFINE:([Expr1010]=NULL))
|--Compute Scalar(DEFINE:([Expr1009]=CASE WHEN [Expr1025]=(0) THEN NULL ELSE [Expr1026] END))
|--Stream Aggregate(DEFINE:([Expr1025]=COUNT_BIG([Sales].[Sales]), [Expr1026]=SUM([Sales].[Sales])))
|--Table Scan(OBJECT:([Sales]))
We can do better by adding a WITH ROLLUP clause to the original query:
SELECT Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr WITH ROLLUP
This query is simpler to write and uses a more efficient query plan with only a single scan:
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1005]=(0) THEN NULL ELSE [Expr1006] END))
|--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1005]=SUM([Expr1007]), [Expr1006]=SUM([Expr1008])))
|--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1007]=COUNT_BIG([Sales].[Sales]), [Expr1008]=SUM([Sales].[Sales])))
|--Sort(ORDER BY:([Sales].[Yr] ASC))
|--Table Scan(OBJECT:([Sales]))
The bottom stream aggregate in this query plan is the same as the stream aggregate in the original non-ROLLUP query. This aggregation is a normal aggregation and, as such, it can be implemented using a stream aggregate (as in this example) or a hash aggregate (try adding an OPTION (HASH GROUP) clause to the above query). It can also be parallelized.
The top stream aggregate is a special aggregate that computes the ROLLUP. (Unfortunately, in SQL Server 2005 there is no way to discern from the query plan that this aggregate implements a ROLLUP. This issue will be fixed in SQL Server 2008 graphical and XML plans.) A ROLLUP aggregate is always implemented using stream aggregate and cannot be parallelized. In this simple example, the ROLLUP stream aggregate merely returns each pre-aggregated input row while maintaining a running total of the Sales column. After outputting the final input row, the aggregate also returns one additional row with the final sum. Since SQL lacks a concept of an ALL value, the Yr column is set to NULL for this final row. If NULL is valid value for Yr, we can identify the ROLLUP row using the GROUPING(Yr) construct.
SELECT
CASE WHEN GROUPING(Yr) = 0
THEN CAST (Yr AS CHAR(5))
ELSE 'ALL'
END AS Yr,
SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr WITH ROLLUP
Yr Sales
----- ---------------------
2005 27000.00
2006 44000.00
2007 49000.00
ALL 120000.00
We can also compute multiple ROLLUP levels in a single query. For example, suppose that we want to compute the sales first by employee and then for each employee by year:
SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH ROLLUP
EmpId Yr Sales
----------- ----------- ---------------------
1 2005 12000.00
1 2006 18000.00
1 2007 25000.00
1 NULL 55000.00
2 2005 15000.00
2 2006 6000.00
2 NULL 21000.00
3 2006 20000.00
3 2007 24000.00
3 NULL 44000.00
NULL NULL 120000.00
There are a couple of points worth noting about this query. First, since the combination of the EmpId and Yr columns is unique, in the absence of the WITH ROLLUP clause, this query would just return the original data. However, with the WITH ROLLUP clause the query produces a useful result. Second, the order of the columns in the GROUP BY clause is relevant with the WITH ROLLUP clause. To see why simply try the same query but reverse the EmpId and Yr columns. Instead of computing the sales first by employee it will compute the sales first by year.
The query plan for this query is identical to the query plan for the prior query except that it groups on both the EmpId and Yr columns instead of on just the EmpId column. Like the prior query plan, this query plan includes two stream aggregates: the bottom one which is a normal stream aggregate and the top one which computes the ROLLUP. This ROLLUP stream aggregate actually computes two running totals: one which computes the total sales for an employee for all years and one which compute the total sales for all employees and all years. This table shows how the ROLLUP computation proceeds:
EmpId |
Yr |
SUM(Sales) BY EmpId, Yr |
SUM(Sales) BY EmpId |
SUM(Sales) |
1 |
2005 |
12000.00 |
12000.00 |
12000.00 |
1 |
2006 |
18000.00 |
30000.00 |
30000.00 |
1 |
2007 |
25000.00 |
55000.00 |
55000.00 |
1 |
NULL |
|
55000.00 |
55000.00 |
2 |
2005 |
15000.00 |
15000.00 |
70000.00 |
2 |
2006 |
6000.00 |
21000.00 |
76000.00 |
2 |
NULL |
|
21000.00 |
76000.00 |
3 |
2006 |
20000.00 |
20000.00 |
96000.00 |
3 |
2007 |
24000.00 |
44000.00 |
120000.00 |
3 |
NULL |
|
44000.00 |
120000.00 |
NULL |
NULL |
|
|
120000.00 |
In my next post, I'll take a look at the WITH CUBE clause. I'll discuss how it differs from WITH ROLLUP both in terms of function and in terms of its implementation.
Comments
Anonymous
September 27, 2007
In my last post, I wrote about how aggregation WITH ROLLUP works. In this post, I will discuss how aggregationAnonymous
September 27, 2007
Great post - I'm sure many people would consder this "the basics" but it was new to me & VERY useful.Thanks!Mike KneeAnonymous
October 11, 2007
In my last two posts, I gave examples of aggregation WITH ROLLUP and CUBE . SQL Server 2008 continuesAnonymous
July 09, 2010
Very helpful, thanks for writing blog. gr8 work!!! Keep it upAnonymous
April 19, 2011
Thanx a lot!!!!Anonymous
November 21, 2012
Post really helps to understand the basic use of ROLLUPAnonymous
June 07, 2013
Cool article on WITH ROLLUP ..Thanks :)Anonymous
October 22, 2013
Very well explained. It cleared all my doubts. Thanks Craig :)Anonymous
October 30, 2013
How can i display the final result as in the column "SUM(Sales) BY EmpId" of above tableAnonymous
October 31, 2013
Starting with SQL Server 2012, you can use a window aggregate function as follows: SELECT EmpId, Yr, SUM(Sales) OVER (PARTITION BY EmpId ORDER BY Yr) AS Sales FROM Sales HTH CraigAnonymous
April 07, 2014
Actually the Aggregate Windows Function SUM(Sales) OVER (PARTITION BY EmpId) exists since SQL 2005 but without the Order By Waldemar