Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Returns the last value in a group. The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned. The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
Syntax
from pyspark.sql import functions as sf
sf.last(col, ignorenulls=False)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or column name |
Column to fetch last value for. |
ignorenulls |
bool | If last value is null then look for non-null value. False by default. |
Returns
pyspark.sql.Column: last value of the group.
Examples
from pyspark.sql import functions as sf
df = spark.createDataFrame([("Alice", 2), ("Bob", 5), ("Alice", None)], ("name", "age"))
df = df.orderBy(df.age.desc())
df.groupby("name").agg(sf.last("age")).orderBy("name").show()
+-----+---------+
| name|last(age)|
+-----+---------+
|Alice| NULL|
| Bob| 5|
+-----+---------+
To ignore any null values, set ignorenulls to True:
df.groupby("name").agg(sf.last("age", ignorenulls=True)).orderBy("name").show()
+-----+---------+
| name|last(age)|
+-----+---------+
|Alice| 2|
| Bob| 5|
+-----+---------+