WATERMARK clause
Applies to: Databricks SQL Databricks Runtime 12.0 and above
Adds a watermark to a relation in a select statement. The WATERMARK
clause only applies to queries on stateful streaming data, which include stream-stream joins and aggregation.
Syntax
from_item
{ table_name [ TABLESAMPLE clause ] [ watermark_clause ] [ table_alias ] |
JOIN clause |
[ LATERAL ] table_valued_function [ table_alias ] |
VALUE clause |
[ LATERAL ] ( query ) [ TABLESAMPLE clause ] [ watermark_clause ] [ table_alias ] }
watermark_clause
WATERMARK named_expression DELAY OF interval
Parameters
-
An expression that provides a value of type
timestamp
. The expression must be either a reference to the existing column, or a deterministic transformation against existing column(s). The expression adds a column of timestamp type which is used to track the watermark. The added column is available to query. -
An interval literal that defines the delay threshold of the watermark. Must be a positive value less than a month.
Examples
-- Creating a streaming table performing time window row count, with defining watermark from existing column
> CREATE OR REFRESH STREAMING TABLE window_agg_1
AS SELECT window(ts, '10 seconds') as w, count(*) as CNT
FROM
STREAM stream_source WATERMARK ts DELAY OF INTERVAL 10 SECONDS AS stream
GROUP BY window(ts, '10 seconds');
-- Creating a streaming table performing time window row count, with deriving a new timestamp column to define watermark
> CREATE OR REFRESH STREAMING TABLE window_agg_2
AS SELECT window(ts, '10 seconds') as w, count(*) as CNT
FROM
STREAM stream_source WATERMARK to_timestamp(ts_str) AS ts DELAY OF INTERVAL 10 SECONDS AS stream
GROUP BY window(ts, '10 seconds');