Lead

Window function: returns the value that is offset rows after the current row, and default if there is less than offset rows after the current row. Beispielsweise gibt eine offset von 1 die nächste Zeile an einem beliebigen Punkt in der Fensterpartition zurück.

Dies entspricht der LEAD-Funktion in SQL.

Syntax

from pyspark.sql import functions as sf

sf.lead(col, offset=1, default=None)

Die Parameter

Parameter	Typ	Description
`col`	`pyspark.sql.Column` oder Spaltenname	Name der Spalte oder des Ausdrucks.
`offset`	int, optional	Die Anzahl der zu erweiternden Zeilen. Der Standardwert ist 1.
`default`	optional	Standardwert.

Rückkehr

pyspark.sql.Column: Wert nach der aktuellen Zeile basierend auf offset.

Examples

Beispiel 1: Verwenden von Lead zum Abrufen des nächsten Werts

from pyspark.sql import functions as sf
from pyspark.sql import Window
df = spark.createDataFrame(
    [("a", 1), ("a", 2), ("a", 3), ("b", 8), ("b", 2)], ["c1", "c2"])
df.show()

+---+---+
| c1| c2|
+---+---+
|  a|  1|
|  a|  2|
|  a|  3|
|  b|  8|
|  b|  2|
+---+---+

w = Window.partitionBy("c1").orderBy("c2")
df.withColumn("next_value", sf.lead("c2").over(w)).show()

+---+---+----------+
| c1| c2|next_value|
+---+---+----------+
|  a|  1|         2|
|  a|  2|         3|
|  a|  3|      NULL|
|  b|  2|         8|
|  b|  8|      NULL|
+---+---+----------+

Beispiel 2: Verwenden von Lead mit einem Standardwert

from pyspark.sql import functions as sf
from pyspark.sql import Window
df = spark.createDataFrame(
    [("a", 1), ("a", 2), ("a", 3), ("b", 8), ("b", 2)], ["c1", "c2"])
w = Window.partitionBy("c1").orderBy("c2")
df.withColumn("next_value", sf.lead("c2", 1, 0).over(w)).show()

+---+---+----------+
| c1| c2|next_value|
+---+---+----------+
|  a|  1|         2|
|  a|  2|         3|
|  a|  3|         0|
|  b|  2|         8|
|  b|  8|         0|
+---+---+----------+

Beispiel 3: Verwenden von Blei mit einem Offset von 2

from pyspark.sql import functions as sf
from pyspark.sql import Window
df = spark.createDataFrame(
    [("a", 1), ("a", 2), ("a", 3), ("b", 8), ("b", 2)], ["c1", "c2"])
w = Window.partitionBy("c1").orderBy("c2")
df.withColumn("next_value", sf.lead("c2", 2, -1).over(w)).show()

+---+---+----------+
| c1| c2|next_value|
+---+---+----------+
|  a|  1|         3|
|  a|  2|        -1|
|  a|  3|        -1|
|  b|  2|        -1|
|  b|  8|        -1|
+---+---+----------+

Feedback

War diese Seite hilfreich?

Last updated on 2026-02-01