Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Returns the maximum value of the expression in a group. Null values are ignored during the computation. NaN values are larger than any other numeric value.
Syntax
from pyspark.sql import functions as sf
sf.max(col)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or column name |
The target column on which the maximum value is computed. |
Returns
pyspark.sql.Column: A column that contains the maximum value computed.
Examples
Example 1: Compute the maximum value of a numeric column
import pyspark.sql.functions as sf
df = spark.range(10)
df.select(sf.max(df.id)).show()
+-------+
|max(id)|
+-------+
| 9|
+-------+
Example 2: Compute the maximum value of a string column
import pyspark.sql.functions as sf
df = spark.createDataFrame([("A",), ("B",), ("C",)], ["value"])
df.select(sf.max(df.value)).show()
+----------+
|max(value)|
+----------+
| C|
+----------+
Example 3: Compute the maximum value of a column in a grouped DataFrame
import pyspark.sql.functions as sf
df = spark.createDataFrame([("A", 1), ("A", 2), ("B", 3), ("B", 4)], ["key", "value"])
df.groupBy("key").agg(sf.max(df.value)).show()
+---+----------+
|key|max(value)|
+---+----------+
| A| 2|
| B| 4|
+---+----------+
Example 4: Compute the maximum value of multiple columns in a grouped DataFrame
import pyspark.sql.functions as sf
df = spark.createDataFrame(
[("A", 1, 2), ("A", 2, 3), ("B", 3, 4), ("B", 4, 5)], ["key", "value1", "value2"])
df.groupBy("key").agg(sf.max("value1"), sf.max("value2")).show()
+---+-----------+-----------+
|key|max(value1)|max(value2)|
+---+-----------+-----------+
| A| 2| 3|
| B| 4| 5|
+---+-----------+-----------+
Example 5: Compute the maximum value of a column with null values
import pyspark.sql.functions as sf
df = spark.createDataFrame([(1,), (2,), (None,)], ["value"])
df.select(sf.max(df.value)).show()
+----------+
|max(value)|
+----------+
| 2|
+----------+
Example 6: Compute the maximum value of a column with "NaN" values
import pyspark.sql.functions as sf
df = spark.createDataFrame([(1.1,), (float("nan"),), (3.3,)], ["value"])
df.select(sf.max(df.value)).show()
+----------+
|max(value)|
+----------+
| NaN|
+----------+