Compartilhar via


CTE (expressão de tabela comum)

Aplica-se a:marca de seleção positiva SQL do Databricks marca de seleção positiva Runtime do Databricks

Define um conjunto de resultados temporário que você pode referenciar possivelmente várias vezes dentro do escopo de uma instrução SQL. Uma CTE é usada principalmente em uma instrução SELECT.

Sintaxe

WITH [ RECURSIVE ] common_table_expression [, ...]

common_table_expression
  view_identifier [ ( column_identifier [, ...] ) ] [ AS ] ( query | recursive_query )

recursive_query
  base_case_query UNION ALL step_case_query

Parâmetros

  • RECURSIVO

    Importante

    Esse recurso está em Visualização Pública.

    Aplica-se a:marcação marcada como sim Databricks Runtime 17.0 e posterior

    Quando especificadas, as expressões de tabela comuns podem conter um recursive_query.

  • view_identifier

    Um identificador pelo qual common_table_expression pode ser referenciado

  • column_identifier

    Um identificador opcional pelo qual uma coluna de common_table_expression pode ser referenciada.

    Se column_identifiers forem especificados, seu número deverá corresponder ao número de colunas retornadas pelo query. Se nenhum nome for especificado, os nomes das colunas serão derivados do query.

  • consulta

    Uma consulta que produz um conjunto de resultados.

  • recursive_query

    Aplica-se a:marcação marcada como sim Databricks Runtime 17.0 e posterior

    Quando RECURSIVE é especificada, uma consulta de um CTE pode se referir a si mesma. Isso torna uma CTE uma CTE recursiva.

    • base_case_subquery

      Essa subconsulta fornece a semente ou a âncora para a recursão. Não deve fazer referência view_identifier.

    • step_subquery

      Essa subconsulta produz um novo conjunto de linhas com base na etapa anterior. Para ser recursivo, ele deve referenciar view_identifier.

Limitações

  • Os CTEs recursivos não têm suporte para UPDATE, DELETE e MERGE instrução.
  • step_subquery não deve incluir referências de coluna correlacionadas a view_identifier.
  • Invocar geradores de número aleatório da parte recursiva pode gerar o mesmo número aleatório em cada iteração.
  • O número de etapas recursivas é limitado a 100.
  • O tamanho total do conjunto de resultados é limitado a 1.000.000 linhas.

Exemplos

-- CTE with multiple column aliases
> WITH t(x, y) AS (SELECT 1, 2)
  SELECT * FROM t WHERE x = 1 AND y = 2;
   1   2

-- CTE in CTE definition
> WITH t AS (
    WITH t2 AS (SELECT 1)
    SELECT * FROM t2)
  SELECT * FROM t;
   1

-- CTE in subquery
> SELECT max(c) FROM (
    WITH t(c) AS (SELECT 1)
    SELECT * FROM t);
      1

-- CTE in subquery expression
> SELECT (WITH t AS (SELECT 1)
          SELECT * FROM t);
                1

-- CTE in CREATE VIEW statement
> CREATE VIEW v AS
    WITH t(a, b, c, d) AS (SELECT 1, 2, 3, 4)
    SELECT * FROM t;
> SELECT * FROM v;
   1   2   3   4

-- CTE names are scoped
> WITH t  AS (SELECT 1),
       t2 AS (
        WITH t AS (SELECT 2)
        SELECT * FROM t)
SELECT * FROM t2;
   2

Exemplos de consulta recursiva

Exemplos de grafos direcionados

O exemplo a seguir mostra como encontrar todos os destinos de Nova York para outras cidades, incluindo rotas diretas e aquelas roteados por outras cidades:

> DROP VIEW IF EXISTS routes;
> CREATE TEMPORARY VIEW routes(origin, destination) AS VALUES
  ('New York', 'Washington'),
  ('New York', 'Boston'),
  ('Boston', 'New York'),
  ('Washington', 'Boston'),
  ('Washington', 'Raleigh');

> WITH RECURSIVE destinations_from_new_york AS (
    SELECT 'New York' AS destination, ARRAY('New York') AS path, 0 AS length
    UNION ALL
    SELECT r.destination, CONCAT(d.path, ARRAY(r.destination)), d.length + 1
      FROM routes AS r
      JOIN destinations_from_new_york AS d
        ON d.destination = r.origin AND NOT ARRAY_CONTAINS(d.path, r.destination)
  )
  SELECT * FROM destinations_from_new_york;
  destination path                                length
  ----------- ----------------------------------- ------
  New York    ["New York"]                        0
  Washington  ["New York","Washington"]           1
  Boston      ["New York","Boston"]               1
  Boston      ["New York","Washington","Boston"]  2
  Raleigh     ["New York","Washington","Raleigh"] 2

O exemplo a seguir demonstra como percorrer um grafo direcionado e verificar a presença de um loop infinito na passagem:

> DROP TABLE IF EXISTS graph;
> CREATE TABLE IF NOT EXISTS graph( f int, t int, label string );
> INSERT INTO graph VALUES
    (1, 2, 'arc 1 -> 2'),
    (1, 3, 'arc 1 -> 3'),
    (2, 3, 'arc 2 -> 3'),
    (1, 4, 'arc 1 -> 4'),
    (4, 5, 'arc 4 -> 5'),
    (5, 1, 'arc 5 -> 1');
> WITH RECURSIVE search_graph(f, t, label, path, cycle) AS (
    SELECT *, array(struct(g.f, g.t)), false FROM graph g
    UNION ALL
    SELECT g.*, path || array(struct(g.f, g.t)), array_contains(path, struct(g.f, g.t))
      FROM graph g, search_graph sg
      WHERE g.f = sg.t AND NOT cycle
  )
  SELECT path FROM search_graph;

O resultado é um conjunto de caminhos de comprimento crescente, evitando ciclos e terminando com um caminho não infinito

[{"f":1,"t":4},{"f":4,"t":5},{"f":5,"t":1},{"f":1,"t":2},{"f":2,"t":3}]`

que visita todos os nós com uma parada clara no coletor "3". A busca em profundidade é realizada modificando a última linha para que seja ordenada pela coluna path.

SELECT * FROM search_graph ORDER BY path;

Os exemplos a seguir demonstram técnicas recursivas básicas:

-- Simple loop-like example.
> WITH RECURSIVE r(col) AS (
    SELECT 'a'
    UNION ALL
    SELECT col || char(ascii(substr(col, -1)) + 1) FROM r WHERE LENGTH(col) < 10
  )
  SELECT * FROM r;
  col
  ----------
  a
  ab
  abc
  abcd
  abcde
  abcdef
  abcdefg
  abcdefgh
  abcdefghi
  abcdefghij

-- Another loop-like example.
> WITH RECURSIVE t(n) AS (
    SELECT 'foo'
    UNION ALL
    SELECT n || ' bar' FROM t WHERE length(n) < 20
  )
  SELECT n AS is_text FROM t;
  is_text
  -------
  foo
  foo bar
  foo bar bar
  foo bar bar bar
  foo bar bar bar bar
  foo bar bar bar bar bar

-- Invoking a recursive CTE from another CTE.
> WITH RECURSIVE x(id) AS (VALUES (1) UNION ALL SELECT id+1 FROM x WHERE id < 5),
                 y(id) AS (VALUES (1) UNION ALL SELECT id+1 FROM x WHERE id < 10)
  SELECT y.*, x.* FROM y LEFT JOIN x USING (id);
  id  id
  --  --
   1  1
   4  4
   6  null
   3  3
   5  5
   2  2

-- An example of invoking a non-recursive CTE from a recursive CTE.
> WITH RECURSIVE
    y (id) AS (VALUES (1)),
    x (id) AS (SELECT * FROM y UNION ALL SELECT id+1 FROM x WHERE id < 5)
  SELECT * FROM x;
  id
  --
   1
   2
   3
   4
   5

-- Temporary views
> CREATE OR REPLACE TEMPORARY VIEW nums AS
    WITH RECURSIVE nums (n) AS (
      VALUES (1)
      UNION ALL
      SELECT n+1 FROM nums WHERE n < 6
    )
    SELECT * FROM nums;
> SELECT * FROM nums;
  n
  __
   1
   2
   3
   4
   5
   6

> INSERT into a table
> CREATE TABLE rt(level INT);

> WITH RECURSIVE r(level) AS (
    VALUES (0)
    UNION ALL
    SELECT level + 1 FROM r WHERE level < 9
  )
  INSERT INTO rt SELECT * FROM r;

> SELECT * from rt;
  level
  -----
      0
      1
      2
      3
      4
      5
      6
      7
      8
      9

-- Nested recursive CTE are supported (only inside anchor)
-- Returns a triangular half – total of 55 rows – of a 10x10 square of numbers 1-10
> WITH RECURSIVE t(j) AS (
    WITH RECURSIVE s(i) AS (
      VALUES (1)
      UNION ALL
      SELECT i+1 FROM s WHERE i < 10
    )
    SELECT i FROM s
    UNION ALL
    SELECT j+1 FROM t WHERE j < 10
  )
  SELECT * FROM t;
  j
  --
   1
   2
   2
   3
   3
   3
   4
  ...
   9
  10
  10
  10
  10
  10
  10
  10
  10
  10
  10

-- Honoring LIMIT
> WITH RECURSIVE t(n) AS (
    VALUES (1)
    UNION ALL
    SELECT n+1 FROM t
  )
  SELECT * FROM t LIMIT 10;
  n
  --
   1
   2
   3
   4
   5
   6
   7
   8
   9
  10

-- Organizational tree examples
-- Extracting all departments under 'A'
> CREATE TABLE department (
    id INTEGER, -- department ID
    parent_department INTEGER, -- upper department ID
    name string -- department name
  );

> INSERT INTO department VALUES
    (0, NULL, 'ROOT'),
    (1, 0, 'A'),
    (2, 1, 'B'),
    (3, 2, 'C'),
    (4, 2, 'D'),
    (5, 0, 'E'),
    (6, 4, 'F'),
    (7, 5, 'G');

> WITH RECURSIVE subdepartment AS (
    SELECT name as root_name, * FROM department WHERE name = 'A'
    UNION ALL
    SELECT sd.root_name, d.* FROM department AS d, subdepartment AS sd
     WHERE d.parent_department = sd.id
  )
  SELECT * FROM subdepartment ORDER BY name;
  root_name id  parent_department  name
  --------- --  -----------------  ----
          A	 1                  0     A
          A	 2	                1     B
          A	 3                  2     C
          A	 4                  2     D
          A	 6	                4     F

-- The same as above but extracting the level number:
> WITH RECURSIVE subdepartment(level, id, parent_department, name) AS (
    SELECT 1, * FROM department WHERE name = 'A'
    UNION ALL
    SELECT sd.level + 1, d.* FROM department AS d, subdepartment AS sd
     WHERE d.parent_department = sd.id
  )
  SELECT * FROM subdepartment ORDER BY name;
  level  id  parent_department  name
  -----  --  -----------------  ----
      1   1                  0     A
      2   2                  1     B
      3   3                  2     C
      3   4                  2     D
      4   6                  4     F

-- The same as above but only shows level 2 or more
> WITH RECURSIVE subdepartment(level, id, parent_department, name) AS (
    SELECT 1, * FROM department WHERE name = 'A'
    UNION ALL
    SELECT sd.level + 1, d.* FROM department AS d, subdepartment AS sd
     WHERE d.parent_department = sd.id
  )
  SELECT * FROM subdepartment WHERE level >= 2 ORDER BY name;
  level id  parent_department  name
  ----- --  -----------------  ----
  2      2                  1     B
  3      3                  2     C
  3      4                  2     D
  4      6                  4     F

-- Binary tree examples
-- Another tree example is to find all paths from the second level nodes “2” and “3” to all other nodes.
> CREATE TABLE tree(id INTEGER, parent_id INTEGER);
> INSERT INTO tree
    VALUES (1, NULL), (2, 1), (3, 1), (4, 2), (5, 2), ( 6, 2), ( 7, 3), ( 8, 3),
           (9,    4), (10,4), (11,7), (12,7), (13,7), (14, 9), (15,11), (16,11);

> WITH RECURSIVE t(id, path) AS (
    VALUES(1,cast(array() as array<Int>))
    UNION ALL
    SELECT tree.id, t.path || array(tree.id)
      FROM tree JOIN t ON (tree.parent_id = t.id)
  )
  SELECT t1.*, t2.*
    FROM t AS t1 JOIN t AS t2
      ON (t1.path[0] = t2.path[0] AND
          size(t1.path) = 1 AND
          size(t2.path) > 1)
  ORDER BY t1.id, t2.id;
  id  path  id  path
  --  ----  --  ------------
   2  [2]    4  [2,4]
   2  [2]    5  [2,5]
   2  [2]    6  [2,6]
   2  [2]    9  [2,4,9]
   2  [2]   10  [2,4,10]
   2  [2]   14  [2,4,9,14]
   3  [3]    7  [3,7]
   3  [3]    8  [3,8]
   3  [3]   11  [3,7,11]
   3  [3]   12  [3,7,12]
   3  [3]   13  [3,7,13]
   3  [3]   15  [3,7,11,15]
   3  [3]   16  [3,7,11,16]

-- Another tree example counts the leaf nodes reachable from nodes “2” and “3”
> CREATE TABLE tree(id INTEGER, parent_id INTEGER);
> INSERT INTO tree
    VALUES (1, NULL), ( 2,1), (3,1 ), ( 4,2), ( 5,2), ( 6,2), ( 7, 3), ( 8, 3),
           (9,    4), (10,4), (11,7), (12,7), (13,7), (14,9), (15,11), (16,11);

> WITH RECURSIVE t(id, path) AS (
    VALUES(1,cast(array() as array<Int>))
    UNION ALL
    SELECT tree.id, t.path || array(tree.id)
      FROM tree JOIN t ON (tree.parent_id = t.id)
  )
  SELECT t1.id, count(*)
    FROM t AS t1 JOIN t AS t2
      ON (t1.path[0] = t2.path[0] AND
          size(t1.path) = 1 AND
          size(t2.path) > 1)
  GROUP BY t1.id
  ORDER BY t1.id;
  id  count(1)
  --  --------
   2         6
   3         7

-- Another tree example that is listing the paths to leaf node:
> CREATE TABLE tree(id INTEGER, parent_id INTEGER);
> INSERT INTO tree
    VALUES (1, NULL), ( 2,1), ( 3,1), ( 4,2), ( 5,2), ( 6,2), ( 7, 3), ( 8, 3),
           (9,    4), (10,4), (11,7), (12,7), (13,7), (14,9), (15,11), (16,11);

> WITH RECURSIVE t(id, path) AS (
    VALUES(1,cast(array() as array<Int>))
    UNION ALL
    SELECT tree.id, t.path || array(tree.id)
      FROM tree JOIN t ON (tree.parent_id = t.id)
  )
  SELECT id, path FROM t
    ORDER BY id;
  id  path
  --  -----------
   1  []
   2  [2]
   3  [3]
   4  [2,4]
   5  [2,5]
   6  [2,6]
   7  [3,7]
   8  [3,8]
   9  [2,4,9]
  10  [2,4,10]
  11  [3,7,11]
  12  [3,7,12]
  13  [3,7,13]
  14  [2,4,9,14]
  15  [3,7,11,15]
  16  [3,7,11,16]

-- Sudoku solver (https://sqlite.org)
-- Highlighted text represent the row-wise scan of a 9x9 Sudoku grid where missing values are represented with spaces.
> WITH RECURSIVE generate_series(gs) AS (
    SELECT 1
    UNION ALL
    SELECT gs+1 FROM generate_series WHERE gs < 9
  ),
  x( s, ind ) AS
    ( SELECT sud, position( ' ' IN sud )
      FROM  (SELECT '4  8725  5  64 213 29   8      6 73 1 8 2 4  97  15 2 3  2  1    659   28 2 37946'::STRING
        AS sud) xx
    UNION ALL
    SELECT substr( s, 1, ind - 1 ) || z || substr( s, ind + 1 ),
           position(' ' in repeat('x',ind) || substr( s, ind + 1 ) )
    FROM x,
     (SELECT gs::STRING AS z FROM generate_series) z
    WHERE ind > 0
      AND NOT EXISTS ( SELECT *
        FROM generate_series
        WHERE z.z = substr( s, ( (ind - 1 ) DIV 9 ) * 9 + gs, 1 )
           OR    z.z = substr( s, mod( ind - 1, 9 ) - 8 + gs * 9, 1 )
           OR    z.z = substr( s, mod( ( ( ind - 1 ) DIV 3 ), 3 ) * 3
                       + ( ( ind - 1 ) DIV 27 ) * 27 + gs
                       + ( ( gs - 1 ) DIV 3 ) * 6, 1 )
     )
  )
  SELECT s FROM x WHERE ind = 0;
  431872569587649213629351874245968731168723495973415628394286157716594382852137946

-- You can underspecify the problem, for example by replacing the leading ‘4’ with space ‘ ‘ which will yield two solutions
431872569587649213629351874245968731168723495973415628394286157716594382852137946
631872594587649213429351867245968731168723459973415628394286175716594382852137946

-- Finding prime numbers (Sieve of Eratosthenes, https://wiki.postgresql.org)
-- This is an implementation of the Sieve of Erastothenes algorithm for finding the prime numbers up to 150.
> WITH RECURSIVE t1(n) AS (
    VALUES(2)
    UNION ALL
    SELECT n+1 FROM t1 WHERE n < 100
  ),
  t2 (n, i) AS (
    SELECT 2*n, 2
      FROM t1 WHERE 2*n <= 150
    UNION ALL
    (
      WITH t3(k) AS (
        SELECT max(i) OVER () + 1 FROM t2
      ),
      t4(k) AS (
        SELECT DISTINCT k FROM t3
      )
      SELECT k*n, k
        FROM t1 CROSS JOIN t4
        WHERE k*k <= 150
    )
  )
  SELECT n FROM t1 EXCEPT
    SELECT n FROM t2
  ORDER BY 1;
  2
  3
  5
  7
  11
  13
  17
  19
  ...