Kopīgot, izmantojot


mode

Returns the most frequent value in a group.

Syntax

from pyspark.sql import functions as sf

sf.mode(col, deterministic=False)

Parameters

Parameter Type Description
col pyspark.sql.Column or column name Target column to compute on.
deterministic bool, optional If there are multiple equally-frequent results then return the lowest (defaults to false).

Returns

pyspark.sql.Column: the most frequent value in a group.

Examples

from pyspark.sql import functions as sf
df = spark.createDataFrame([
    ("Java", 2012, 20000), ("dotNET", 2012, 5000),
    ("Java", 2012, 20000), ("dotNET", 2012, 5000),
    ("dotNET", 2013, 48000), ("Java", 2013, 30000)],
    schema=("course", "year", "earnings"))
df.groupby("course").agg(sf.mode("year")).sort("course").show()
+------+----------+
|course|mode(year)|
+------+----------+
|  Java|      2012|
|dotNET|      2012|
+------+----------+

When multiple values have the same greatest frequency then either any of values is returned if deterministic is false or is not defined, or the lowest value is returned if deterministic is true.

from pyspark.sql import functions as sf
df = spark.createDataFrame([(-10,), (0,), (10,)], ["col"])
df.select(sf.mode("col", True)).show()
+---------------------------------------+
|mode() WITHIN GROUP (ORDER BY col DESC)|
+---------------------------------------+
|                                    -10|
+---------------------------------------+