Piezīmes
Lai piekļūtu šai lapai, ir nepieciešama autorizācija. Varat mēģināt pierakstīties vai mainīt direktorijus.
Lai piekļūtu šai lapai, ir nepieciešama autorizācija. Varat mēģināt mainīt direktorijus.
Returns the most frequent value in a group.
Syntax
from pyspark.sql import functions as sf
sf.mode(col, deterministic=False)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or column name |
Target column to compute on. |
deterministic |
bool, optional | If there are multiple equally-frequent results then return the lowest (defaults to false). |
Returns
pyspark.sql.Column: the most frequent value in a group.
Examples
from pyspark.sql import functions as sf
df = spark.createDataFrame([
("Java", 2012, 20000), ("dotNET", 2012, 5000),
("Java", 2012, 20000), ("dotNET", 2012, 5000),
("dotNET", 2013, 48000), ("Java", 2013, 30000)],
schema=("course", "year", "earnings"))
df.groupby("course").agg(sf.mode("year")).sort("course").show()
+------+----------+
|course|mode(year)|
+------+----------+
| Java| 2012|
|dotNET| 2012|
+------+----------+
When multiple values have the same greatest frequency then either any of values is returned if deterministic is false or is not defined, or the lowest value is returned if deterministic is true.
from pyspark.sql import functions as sf
df = spark.createDataFrame([(-10,), (0,), (10,)], ["col"])
df.select(sf.mode("col", True)).show()
+---------------------------------------+
|mode() WITHIN GROUP (ORDER BY col DESC)|
+---------------------------------------+
| -10|
+---------------------------------------+