Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Window function: returns the value that is offset rows after the current row, and default if there is less than offset rows after the current row. For example, an offset of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
Syntax
from pyspark.sql import functions as sf
sf.lead(col, offset=1, default=None)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or column name |
Name of column or expression. |
offset |
int, optional | Number of row to extend. Default is 1. |
default |
optional | Default value. |
Returns
pyspark.sql.Column: value after current row based on offset.
Examples
Example 1: Using lead to get next value
from pyspark.sql import functions as sf
from pyspark.sql import Window
df = spark.createDataFrame(
[("a", 1), ("a", 2), ("a", 3), ("b", 8), ("b", 2)], ["c1", "c2"])
df.show()
+---+---+
| c1| c2|
+---+---+
| a| 1|
| a| 2|
| a| 3|
| b| 8|
| b| 2|
+---+---+
w = Window.partitionBy("c1").orderBy("c2")
df.withColumn("next_value", sf.lead("c2").over(w)).show()
+---+---+----------+
| c1| c2|next_value|
+---+---+----------+
| a| 1| 2|
| a| 2| 3|
| a| 3| NULL|
| b| 2| 8|
| b| 8| NULL|
+---+---+----------+
Example 2: Using lead with a default value
from pyspark.sql import functions as sf
from pyspark.sql import Window
df = spark.createDataFrame(
[("a", 1), ("a", 2), ("a", 3), ("b", 8), ("b", 2)], ["c1", "c2"])
w = Window.partitionBy("c1").orderBy("c2")
df.withColumn("next_value", sf.lead("c2", 1, 0).over(w)).show()
+---+---+----------+
| c1| c2|next_value|
+---+---+----------+
| a| 1| 2|
| a| 2| 3|
| a| 3| 0|
| b| 2| 8|
| b| 8| 0|
+---+---+----------+
Example 3: Using lead with an offset of 2
from pyspark.sql import functions as sf
from pyspark.sql import Window
df = spark.createDataFrame(
[("a", 1), ("a", 2), ("a", 3), ("b", 8), ("b", 2)], ["c1", "c2"])
w = Window.partitionBy("c1").orderBy("c2")
df.withColumn("next_value", sf.lead("c2", 2, -1).over(w)).show()
+---+---+----------+
| c1| c2|next_value|
+---+---+----------+
| a| 1| 3|
| a| 2| -1|
| a| 3| -1|
| b| 2| -1|
| b| 8| -1|
+---+---+----------+