SELECT
Applies to: Databricks SQL Databricks Runtime
Composes a result set from one or more table references.
The SELECT
clause can be part of a query which also includes common table expressions (CTE), set operations, and various other clauses.
Syntax
SELECT [ hints ] [ ALL | DISTINCT ] { named_expression | star_clause } [, ...]
FROM table_reference [, ...]
[ LATERAL VIEW clause ]
[ WHERE clause ]
[ GROUP BY clause ]
[ HAVING clause]
[ QUALIFY clause ]
named_expression
expression [ column_alias ]
star_clause
[ { table_name | view_name } . ] * [ except_clause ]
except_clause
EXCEPT ( { column_name | field_name } [, ...] )
Parameters
-
Hints help the Azure Databricks optimizer make better planning decisions. Azure Databricks supports hints that influence selection of join strategies and repartitioning of the data.
ALL
Select all matching rows from the table references. Enabled by default.
DISTINCT
Select all matching rows from the table references after removing duplicates in results.
named_expression
An expression with an optional assigned name.
-
A combination of one or more values, operators, and SQL functions that evaluates to a value.
-
An optional column identifier naming the expression result. If no
column_alias
is provided Databricks SQL derives one.
-
-
A shorthand to name all the referenceable columns in the
FROM
clause or a specific table reference’s columns or fields in theFROM
clause. -
A source of input for the
SELECT
. This input reference can be turned into a streaming reference by using theSTREAM
keyword prior to the reference. -
Used in conjunction with generator functions such as
EXPLODE
, which generates a virtual table containing one or more rows.LATERAL VIEW
applies the rows to each original output row.In Databricks SQL, and starting with Databricks Runtime 12.2 this clause is deprecated. You should invoke a table valued generator function as a table_reference.
-
Filters the result of the
FROM
clause based on the supplied predicates. -
The expressions that are used to group the rows. This is used in conjunction with aggregate functions (
MIN
,MAX
,COUNT
,SUM
,AVG
) to group rows based on the grouping expressions and aggregate values in each group. When aFILTER
clause is attached to an aggregate function, only the matching rows are passed to that function. -
The predicates by which the rows produced by
GROUP BY
are filtered. TheHAVING
clause is used to filter rows after the grouping is performed. If you specifyHAVING
withoutGROUP BY
, it indicates aGROUP BY
without grouping expressions (global aggregate). -
The predicates that are used to filter the results of window functions. To use
QUALIFY
, at least one window function is required to be present in the SELECT list or the QUALIFY clause.
Select on Delta table
In addition to the standard SELECT
options, Delta tables support the time travel options described in this section. For details, see Work with Delta Lake table history.
AS OF
syntax
table_identifier TIMESTAMP AS OF timestamp_expression
table_identifier VERSION AS OF version
timestamp_expression
can be any one of:'2018-10-18T22:15:12.013Z'
, that is, a string that can be cast to a timestampcast('2018-10-18 13:36:32 CEST' as timestamp)
'2018-10-18'
, that is, a date stringcurrent_timestamp() - interval 12 hours
date_sub(current_date(), 1)
- Any other expression that is or can be cast to a timestamp
version
is a long value that can be obtained from the output ofDESCRIBE HISTORY table_spec
.
Neither timestamp_expression
nor version
can be subqueries.
Example
> SELECT * FROM events TIMESTAMP AS OF '2018-10-18T22:15:12.013Z'
> SELECT * FROM events VERSION AS OF 123
@
syntax
Use the @
syntax to specify the timestamp or version. The timestamp must be in yyyyMMddHHmmssSSS
format. You can specify a version after @
by prepending a v
to the version. For example, to query version 123
for the table events
, specify events@v123
.
Example
> SELECT * FROM events@20190101000000000
> SELECT * FROM events@v123
Examples
-- select all referencable columns from all tables
> SELECT * FROM VALUES(1, 2) AS t1(c1, c2), VALUES(3, 4) AS t2(c3, c4);
1 2 3 4
-- select all referencable columns from one table
> SELECT t2.* FROM VALUES(1, 2) AS t1(c1, c2), VALUES(3, 4) AS t2(c3, c4);
3 4
-- select all referencable columns from all tables except t2.c4
> SELECT * EXCEPT(c4) FROM VALUES(1, 2) AS t1(c1, c2), VALUES(3, 4) AS t2(c3, c4);
1 2 3
-- select all referencable columns from a table, except a nested field.
> SELECT * EXCEPT(c2.b) FROM VALUES(1, named_struct('a', 2, 'b', 3)) AS t(c1, c2);
1 { "a" : 2 }
-- Removing all fields results in an empty struct
> SELECT * EXCEPT(c2.b, c2.a) FROM VALUES(1, named_struct('a', 2, 'b', 3)) AS t(c1, c2);
1 { }
-- Overlapping names result in an error
> SELECT * EXCEPT(c2, c2.a) FROM VALUES(1, named_struct('a', 2, 'b', 3)) AS t(c1, c2);
Error: EXCEPT_OVERLAPPING_COLUMNS
Related articles
- CLUSTER BY clause
- Common table expression (CTE)
- DISTRIBUTE BY clause
- GROUP BY clause
- HAVING clause
- QUALIFY clause
- Hints
- VALUES clause
- JOIN
- LATERAL VIEW clause
- LIMIT clause
- OFFSET clause
- ORDER BY clause
- PIVOT clause
- Query
- TABLESAMPLE clause
- Set operators
- SORT BY clause
- Star clause
- Table-valued function (TVF)
- table reference
- UNPIVOT clause
- WHERE clause
- WINDOW clause
- Window functions