Нотатка
Доступ до цієї сторінки потребує авторизації. Можна спробувати ввійти або змінити каталоги.
Доступ до цієї сторінки потребує авторизації. Можна спробувати змінити каталоги.
Returns the most frequent value in a group.
Syntax
from pyspark.sql import functions as sf
sf.mode(col, deterministic=False)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or column name |
Target column to compute on. |
deterministic |
bool, optional | If there are multiple equally-frequent results then return the lowest (defaults to false). |
Returns
pyspark.sql.Column: the most frequent value in a group.
Examples
from pyspark.sql import functions as sf
df = spark.createDataFrame([
("Java", 2012, 20000), ("dotNET", 2012, 5000),
("Java", 2012, 20000), ("dotNET", 2012, 5000),
("dotNET", 2013, 48000), ("Java", 2013, 30000)],
schema=("course", "year", "earnings"))
df.groupby("course").agg(sf.mode("year")).sort("course").show()
+------+----------+
|course|mode(year)|
+------+----------+
| Java| 2012|
|dotNET| 2012|
+------+----------+
When multiple values have the same greatest frequency then either any of values is returned if deterministic is false or is not defined, or the lowest value is returned if deterministic is true.
from pyspark.sql import functions as sf
df = spark.createDataFrame([(-10,), (0,), (10,)], ["col"])
df.select(sf.mode("col", True)).show()
+---------------------------------------+
|mode() WITHIN GROUP (ORDER BY col DESC)|
+---------------------------------------+
| -10|
+---------------------------------------+