視窗框架條款

適用於:選取「是」Databricks SQL 選取「是」Databricks Runtime

指定彙總或分析視窗函數運作所在分區內資料列的滑動子集。

語法

{ frame_mode frame_start |
  frame_mode BETWEEN frame_start AND frame_end } }

frame_mode
{ RANGE | ROWS }

frame_start
{ UNBOUNDED PRECEDING |
  offset_start PRECEDING |
  CURRENT ROW |
  offset_start FOLLOWING }

frame_end
{ offset_stop PRECEDING |
  CURRENT ROW |
  offset_stop FOLLOWING |
  UNBOUNDED FOLLOWING }

參數

  • frame_mode

    • ROWS

      如果指定,滑動視窗框架會以相對於當前行之前或之後的行數表示。

    • 範圍

      如果指定了視窗函數,則必須使用單一的ORDER BY表示式指定

      然後,滑動視窗的界限會以當前列的 obExpr 偏移量來表示。

      若缺少ORDER BY子句,Azure Databricks會提出DATATYPE_MISMATCH。RANGE_FRAME_WITHOUT_ORDER。 若 ORDER BY 子句有多個表達式,Azure Databricks 會產生 DATATYPE_MISMATCH。RANGE_FRAME_MULTI_ORDER

  • frame_start

    相對於當前資料列的滑動視窗開始位置。

    • 無界先行

      指定視窗框架從分區的開頭開始。

    • 偏移_start 之前

      如果模式為 ROWSoffset_start 則為正整數常值數位,定義框架開始目前數據列之前的列數。

      如果模式為 RANGE,則 offset_start 為一個可以從 obExpr 減去的型別中的正的常量值。 框架會從分區的第一行開始,其中 obExpr 在當前行大於或等於 obExpr - offset_start

    • 目前列

      指定框架從目前數據列開始。

    • 從offset_start開始的部分

      如果模式為 ROWS,則 offset_start 是定義框架從目前列前推多少行開始的正整數文字值。 若模式為 RANGEoffset_start 為一個可以加至 obExpr 的正的字面值型別。 框架會從分區的第一行開始,其中 obExpr 在當前行大於或等於 obExpr + offset_start

  • frame_stop

    相對於當前資料列的滑動視窗框架結束。

    如果未指定,框架會在 [CURRENT ROW] 停止。 滑動視窗的結尾必須大於視窗框架的開頭。

    • offset_stop 前置

      如果 frame_mode 為 ROWSoffset_stop 則為正整數常值數字,定義框架在當前列之前停止的列數。 如果 frame_mode 為 RANGEoffset_stop 則為與 offset_start 相同類型的正值常量。 框架結束於分區中的最後一列,其中 obExpr 小於或等於當前行的 obExpr - offset_stop

    • 目前列

      指定範圍在當前列停止。

    • offset_stop 跟進

      如果 frame_mode 為 ROWS,則 offset_stop 為正整數常量,定義截至當前行之後結束的行數。 如果 frame_mode 為 RANGEoffset_stop 則為與 offset_start 相同類型的正值常量。 框架會在 obExpr 小於或等於當前行 obExpr + offset_stop 的最後一行的分割結束。

    • 無界限跟隨

      指定視窗框架終止於分割區的結尾。

常見錯誤條件

Examples

> CREATE TABLE employees
   (name STRING, dept STRING, salary INT, age INT);
> INSERT INTO employees
   VALUES ('Lisa', 'Sales', 10000, 35),
          ('Evan', 'Sales', 32000, 38),
          ('Fred', 'Engineering', 21000, 28),
          ('Alex', 'Sales', 30000, 33),
          ('Tom', 'Engineering', 23000, 33),
          ('Jane', 'Marketing', 29000, 28),
          ('Jeff', 'Marketing', 35000, 38),
          ('Paul', 'Engineering', 29000, 23),
          ('Chloe', 'Engineering', 23000, 25);

-- ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: cumulative sum of salary within each department.
> SELECT name,
         dept,
         salary,
         SUM(salary) OVER (PARTITION BY dept ORDER BY salary
                           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
    FROM employees;
  Fred Engineering  21000  21000
 Chloe Engineering  23000  44000
   Tom Engineering  23000  67000
  Paul Engineering  29000  96000
  Jane   Marketing  29000  29000
  Jeff   Marketing  35000  64000
  Lisa       Sales  10000  10000
  Alex       Sales  30000  40000
  Evan       Sales  32000  72000

-- ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING: moving average over three adjacent rows.
> SELECT name,
         dept,
         salary,
         ROUND(AVG(salary) OVER (PARTITION BY dept ORDER BY salary
                                 ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)) AS moving_avg
    FROM employees;
  Fred Engineering  21000  22000
 Chloe Engineering  23000  22333
   Tom Engineering  23000  25000
  Paul Engineering  29000  26000
  Jane   Marketing  29000  32000
  Jeff   Marketing  35000  32000
  Lisa       Sales  10000  20000
  Alex       Sales  30000  24000
  Evan       Sales  32000  31000

-- ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING: sum from the current row to the end of the partition.
> SELECT name,
         dept,
         salary,
         SUM(salary) OVER (PARTITION BY dept ORDER BY salary
                           ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS remaining_total
    FROM employees;
  Fred Engineering  21000  96000
 Chloe Engineering  23000  75000
   Tom Engineering  23000  52000
  Paul Engineering  29000  29000
  Jane   Marketing  29000  64000
  Jeff   Marketing  35000  35000
  Lisa       Sales  10000  72000
  Alex       Sales  30000  62000
  Evan       Sales  32000  32000

-- RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: cumulative sum using value-based range.
-- Unlike ROWS mode, RANGE groups rows with equal `ORDER BY` values together.
> SELECT name,
         dept,
         salary,
         SUM(salary) OVER (PARTITION BY dept ORDER BY salary
                           RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS range_total
    FROM employees;
  Fred Engineering  21000  21000
 Chloe Engineering  23000  67000
   Tom Engineering  23000  67000
  Paul Engineering  29000  96000
  Jane   Marketing  29000  29000
  Jeff   Marketing  35000  64000
  Lisa       Sales  10000  10000
  Alex       Sales  30000  40000
  Evan       Sales  32000  72000

-- RANGE BETWEEN 5000 PRECEDING AND 5000 FOLLOWING: sum of salaries within +/- 5000 of the current row's salary.
> SELECT name,
         dept,
         salary,
         SUM(salary) OVER (PARTITION BY dept ORDER BY salary
                           RANGE BETWEEN 5000 PRECEDING AND 5000 FOLLOWING) AS nearby_total
    FROM employees;
  Fred Engineering  21000  67000
 Chloe Engineering  23000  67000
   Tom Engineering  23000  67000
  Paul Engineering  29000  75000
  Jane   Marketing  29000  64000
  Jeff   Marketing  35000  35000
  Lisa       Sales  10000  10000
  Alex       Sales  30000  92000
  Evan       Sales  32000  62000

-- Comparing ROWS vs RANGE: the difference is visible when there are duplicate `ORDER BY` values.
-- With ROWS, `Chloe` and `Tom` have different running totals because each row is counted individually.
-- With RANGE, `Chloe` and `Tom` have the same total because they share the same salary value.
> SELECT name,
         salary,
         SUM(salary) OVER (ORDER BY salary
                           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)  AS rows_total,
         SUM(salary) OVER (ORDER BY salary
                           RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS range_total
    FROM employees
    WHERE dept = 'Engineering';
  Fred  21000  21000  21000
 Chloe  23000  44000  67000
   Tom  23000  67000  67000
  Paul  29000  96000  96000

-- `RANGE` frame requires exactly one `ORDER BY` expression.
> SELECT SUM(salary) OVER (ORDER BY salary, age
                           RANGE BETWEEN 1000 PRECEDING AND 1000 FOLLOWING)
    FROM employees;
  Error: DATATYPE_MISMATCH

-- `RANGE` frame requires an `ORDER BY` clause.
> SELECT SUM(salary) OVER (RANGE BETWEEN 1000 PRECEDING AND 1000 FOLLOWING)
    FROM employees;
  Error: DATATYPE_MISMATCH