session_window

Generates session window given a timestamp specifying column.

Session window is one of dynamic windows, which means the length of window is varying according to the given inputs. The length of session window is defined as "the timestamp of latest input of the session + gap duration", so when the new inputs are bound to the current session window, the end time of session window can be expanded according to the new inputs.

Windows can support microsecond precision. Windows in the order of months are not supported.

For a streaming query, you may use the function current_timestamp to generate windows on processing time. gapDuration is provided as strings, e.g. '1 second', '1 day 12 hours', '2 minutes'. Valid interval strings are 'week', 'day', 'hour', 'minute', 'second', 'millisecond', 'microsecond'.

It could also be a Column which can be evaluated to gap duration dynamically based on the input row.

The output column will be a struct called 'session_window' by default with the nested columns 'start' and 'end', where 'start' and 'end' will be of pyspark.sql.types.TimestampType.

For the corresponding Databricks SQL function, see session_window grouping expression.

Syntax

from pyspark.databricks.sql import functions as dbf

dbf.session_window(timeColumn=<timeColumn>, gapDuration=<gapDuration>)

Parameters

Parameter	Type	Description
`timeColumn`	`pyspark.sql.Column` or `str`	The column name or column to use as the timestamp for windowing by time. The time column must be of TimestampType or TimestampNTZType.
`gapDuration`	`pyspark.sql.Column` or `literal string`	A Python string literal or column specifying the timeout of the session. It could be static value, e.g. `10 minutes`, `1 second`, or an expression/UDF that specifies gap duration dynamically based on the input row.

Returns

pyspark.sql.Column: the column for computed results.

Examples

from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('2016-03-11 09:00:07', 1)], ['dt', 'v'])
df2 = df.groupBy(dbf.session_window('dt', '5 seconds')).agg(dbf.sum('v'))
df2.show(truncate=False)
df2.printSchema()

Atsauksmes

Vai šī lapa palīdzēja?

Last updated on 2026-01-29