Theory question on GROUP BY CUBE

Mikhail Firsov 1,801 Reputation points
2021-02-20T13:38:36.683+00:00

Hello!

Would you please help me understand the following:

My MS book "Querying SQL 2016" says:
70208-q0.png

Does this mean that if I do NOT want to use "a shortcut" I can rewrite this code using the GROUPING SETS clause with the 4 grouping sets (plus () )?

Select Category, Cust, SUM(Qty) AS TotalQty
From Sales.CategorySales
GROUP BY
GROUPING SETS ((Category, Cust), (Cust, Category),(Cust), (Category), ())

Thank you in advance,
Michael

{count} votes

Accepted answer
  1. Erland Sommarskog 77,771 Reputation points MVP
    2021-02-24T22:15:41.29+00:00

    Yes, your observation is correct, and I should have paid attention what it said the in excerpt of your book. Order does not matter, so there is little point in grouping by different permutations. It appears that the author was in confusion.

    GROUP BY CUBE(a, b, c) will give you grouping by (a, b, c), (a, b), (a, c), (b, c), (a), (b), (c), and ().

    0 comments No comments

4 additional answers

Sort by: Most helpful
  1. Erland Sommarskog 77,771 Reputation points MVP
    2021-02-20T14:29:45.727+00:00

    Yes. Myself, I tend to avoid the shortcuts ROLLUP and CUBE, so that I don't have to remember that they mean.

    0 comments No comments

  2. EchoLiu-MSFT 14,516 Reputation points
    2021-02-22T08:21:17.103+00:00

    Hi @Mikhail Firsov ,

    Yes, I rarely use ROLLUP and CUBE. I did a simple Google and found an article describing them in detail. hope this helps:

    Group By in SQL Sever with CUBE, ROLLUP and GROUPING SETS Examples

    In addition, I did a simple test, and your idea does not seem to hold true:

    CREATE TABLE #EmpSalary   
    (   
      id INT PRIMARY KEY IDENTITY(1,1),   
      EmpName varchar (200),   
      Department varchar(100),   
      Category char(1),   
      Salary money   
    )   
        
    INSERT #EmpSalary   
    SELECT 'Bhavesh Patel','IT','A',$8000   
    UNION ALL   
    SELECT 'Alpesh Patel','Sales','A',$7000   
    UNION ALL   
    SELECT 'Kalpesh Thakor','IT','B',$5000   
    UNION ALL   
    SELECT 'Jay Shah','Sales','B',$4000   
    UNION ALL   
    SELECT 'Ram Nayak','IT','C',$3000   
    UNION ALL   
    SELECT 'Jay Shaw','Sales','C',$2000  
    UNION ALL   
    SELECT 'Jay Shaw','Sales','C',$2000;  
      
    SELECT   
       Department,   
       Category,   
       SUM(Salary) as Salary   
    FROM #EmpSalary   
    GROUP BY CUBE(Category,Department);   
    

    Output:
    70536-image.png

    SELECT   
       Department,   
       Category,   
       SUM(Salary) as Salary   
    FROM #EmpSalary   
    GROUP BY GROUPING SETS((Category, Department),(Department,Category),Department,  
    Category,());   
    

    Output:
    70517-image.png

    If you have any question, please feel free to let me know.

    Regards
    Echo


    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  3. Mikhail Firsov 1,801 Reputation points
    2021-02-24T11:29:38.887+00:00

    Hello all,
    Sorry for the delay!

    "Yes. Myself, I tend to avoid the shortcuts ROLLUP and CUBE, so that I don't have to remember that they mean. " - so do I!

    "and your idea does not seem to hold true" - it's not my idea, that's the idea from my MS SQL2016 official curriculum :) and that's why I'm asking that question!
    MS's theory clearly states - and they are completely right! - that in case with the 2 columns there will be the 4 combinations (2 squared - not counting () ), but in practice the CUBE seems NOT to return the second permutation - (Cust, Category), thus making the outputs of the two commands (CUBE and GROUPING SETS) different, for example:

    https://www.sqltutorial.org/sql-cube/
    71522-q1.png

    This puzzles me as the practice contradicts the theory - at least in terms of the number of the resulting rows.

    Futhermore, many non-MS articles regarding the CUBE are also saying that CUBE for N columns will have 2 raised to the power of N permutations, but nevertheless do NOT use BOTH grouping sets (Category, Cust) AND (Cust, Category) - they describe only the ONE of them (usually the first - in my example this is the (Category, Cust) ).

    71594-q5-new.png

    I suppose this is because these two sets - (Category, Cust) and (Cust, Category) in my book - are equal and CUBE just does not output it twice (https://www.sqltutorial.org/sql-cube):
    71556-q6-new.png

    What do you think?

    0 comments No comments

  4. Mikhail Firsov 1,801 Reputation points
    2021-02-25T16:04:54.423+00:00

    Hello ErlandSommarskog,

    Theoretically the author is right because by MS' definition the CUBE is...

    "GROUP BY CUBE creates groups for all possible combinations of columns."

    "All possible combination of columns" must include both (a,b) and (b,c) regardless of their equality. If CUBE works differently it should have been mentioned.

    Thank you very much for your help!

    Regards,
    Michael