Zdieľať cez


PIVOT clause

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Transforms the rows of the preceding table_reference by rotating unique values of a specified column list into separate columns.

Syntax

PIVOT ( { aggregate_expression [ [ AS ] agg_column_alias ] } [, ...]
    FOR column_list IN ( expression_list ) )

column_list
 { column_name |
   ( column_name [, ...] ) }

expression_list
 { expression [ AS ] [ column_alias ] |
   { ( expression [, ...] ) [ AS ] [ column_alias] } [, ...] ) }

Parameters

  • aggregate_expression

    An expression of any type where all column references table_reference are arguments to aggregate functions.

  • agg_column_alias

    An optional alias for the result of the aggregation. If no alias is specified, PIVOT generates an alias based on aggregate_expression.

  • column_list

    The set of columns to be rotated.

  • expression_list

    Maps values from column_list to column aliases.

    • expression

      A literal expression with a type that shares a least common type with the respective column_name.

      The number of expressions in each tuple must match the number of column_names in column_list.

    • column_alias

      An optional alias specifying the name of the generated column. If no alias is specified PIVOT generates an alias based on the expressions.

Result

A temporary table of the following form:

  • All the columns from the intermediate result set of the table_reference that have not been specified in any aggregate_expression or column_list.

    These columns are grouping columns.

  • For each expression tuple and aggregate_expression combination, PIVOT generates one column. The type is the type of aggregate_expression.

    If there is only one aggregate_expression the column is named using column_alias. Otherwise, it is named column_alias_agg_column_alias.

    The value in each cell is the result of the aggregation_expression using a FILTER ( WHERE column_list IN (expression, ...).

Examples

-- A very basic PIVOT
-- Given a table with sales by quarter, return a table that returns sales across quarters per year.
> CREATE TEMP VIEW sales(year, quarter, region, sales) AS
   VALUES (2018, 1, 'east', 100),
          (2018, 2, 'east',  20),
          (2018, 3, 'east',  40),
          (2018, 4, 'east',  40),
          (2019, 1, 'east', 120),
          (2019, 2, 'east', 110),
          (2019, 3, 'east',  80),
          (2019, 4, 'east',  60),
          (2018, 1, 'west', 105),
          (2018, 2, 'west',  25),
          (2018, 3, 'west',  45),
          (2018, 4, 'west',  45),
          (2019, 1, 'west', 125),
          (2019, 2, 'west', 115),
          (2019, 3, 'west',  85),
          (2019, 4, 'west',  65);

> SELECT year, region, q1, q2, q3, q4
  FROM sales
  PIVOT (sum(sales) AS sales
    FOR quarter
    IN (1 AS q1, 2 AS q2, 3 AS q3, 4 AS q4));
 year  region  q1   q2   q3  q4
 2018  east   100   20   40  40
 2019  east   120  110   80  60
 2018  west   105   25   45  45
 2019  west   125  115   85  65

-- The same query written without PIVOT
> SELECT year, region,
         sum(sales) FILTER(WHERE quarter = 1) AS q1,
         sum(sales) FILTER(WHERE quarter = 2) AS q2,
         sum(sales) FILTER(WHERE quarter = 3) AS q2,
         sum(sales) FILTER(WHERE quarter = 4) AS q4
  FROM sales
  GROUP BY year, region;
 year  region  q1   q2   q3  q4
 2018  east   100   20   40  40
 2019  east   120  110   80  60
 2018  west   105   25   45  45
 2019  west   125  115   85  65

-- Also PIVOT on region
> SELECT year, q1_east, q1_west, q2_east, q2_west, q3_east, q3_west, q4_east, q4_west
    FROM sales
    PIVOT (sum(sales) AS sales
      FOR (quarter, region)
      IN ((1, 'east') AS q1_east, (1, 'west') AS q1_west, (2, 'east') AS q2_east, (2, 'west') AS q2_west,
          (3, 'east') AS q3_east, (3, 'west') AS q3_west, (4, 'east') AS q4_east, (4, 'west') AS q4_west));
 year  q1_east  q1_west  q2_east  q2_west  q3_east  q3_west  q4_east  q4_west
 2018      100      105       20       25       40       45       40       45
 2019      120      125      110      115       80       85       60       65

-- The same query written without PIVOT
> SELECT year,
    sum(sales) FILTER(WHERE (quarter, region) IN ((1, 'east'))) AS q1_east,
    sum(sales) FILTER(WHERE (quarter, region) IN ((1, 'west'))) AS q1_west,
    sum(sales) FILTER(WHERE (quarter, region) IN ((2, 'east'))) AS q2_east,
    sum(sales) FILTER(WHERE (quarter, region) IN ((2, 'west'))) AS q2_west,
    sum(sales) FILTER(WHERE (quarter, region) IN ((3, 'east'))) AS q3_east,
    sum(sales) FILTER(WHERE (quarter, region) IN ((3, 'west'))) AS q3_west,
    sum(sales) FILTER(WHERE (quarter, region) IN ((4, 'east'))) AS q4_east,
    sum(sales) FILTER(WHERE (quarter, region) IN ((4, 'west'))) AS q4_west
    FROM sales
    GROUP BY year;
 year  q1_east  q1_west  q2_east  q2_west  q3_east  q3_west  q4_east  q4_west
 2018      100      105       20       25       40       45       40       45
 2019      120      125      110      115       80       85       60       65

-- To aggregate across regions the column must be removed from the input.
> SELECT year, q1, q2, q3, q4
  FROM (SELECT year, quarter, sales FROM sales) AS s
  PIVOT (sum(sales) AS sales
    FOR quarter
    IN (1 AS q1, 2 AS q2, 3 AS q3, 4 AS q4));
  year   q1   q2   q3   q4
  2018  205   45   85   85
  2019  245  225  165  125

-- The same query without PIVOT
> SELECT year,
    sum(sales) FILTER(WHERE quarter = 1) AS q1,
    sum(sales) FILTER(WHERE quarter = 2) AS q2,
    sum(sales) FILTER(WHERE quarter = 3) AS q3,
    sum(sales) FILTER(WHERE quarter = 4) AS q4
    FROM sales
    GROUP BY year;
  year   q1   q2   q3   q4
  2018  205   45   85   85
  2019  245  225  165  125

-- A PIVOT with multiple aggregations
> SELECT year, q1_total, q1_avg, q2_total, q2_avg, q3_total, q3_avg, q4_total, q4_avg
    FROM (SELECT year, quarter, sales FROM sales) AS s
    PIVOT (sum(sales) AS total, avg(sales) AS avg
      FOR quarter
      IN (1 AS q1, 2 AS q2, 3 AS q3, 4 AS q4));
 year  q1_total  q1_avg  q2_total  q2_avg  q3_total  q3_avg  q4_total  q4_avg
 2018       205  102.5         45   22.5         85   42.5         85   42.5
 2019       245  122.5        225  112.5        165   82.5        125   62.5

-- The same query without PIVOT
> SELECT year,
         sum(sales) FILTER(WHERE quarter = 1) AS q1_total,
         avg(sales) FILTER(WHERE quarter = 1) AS q1_avg,
         sum(sales) FILTER(WHERE quarter = 2) AS q2_total,
         avg(sales) FILTER(WHERE quarter = 2) AS q2_avg,
         sum(sales) FILTER(WHERE quarter = 3) AS q3_total,
         avg(sales) FILTER(WHERE quarter = 3) AS q3_avg,
         sum(sales) FILTER(WHERE quarter = 4) AS q4_total,
         avg(sales) FILTER(WHERE quarter = 4) AS q4_avg
    FROM sales
    GROUP BY year;
 year  q1_total  q1_avg  q2_total  q2_avg  q3_total  q3_avg  q4_total  q4_avg
 2018       205  102.5         45   22.5         85   42.5         85   42.5
 2019       245  122.5        225  112.5        165   82.5        125   62.5