opomba,
Dostop do te strani zahteva pooblastilo. Poskusite se vpisati alispremeniti imenike.
Dostop do te strani zahteva pooblastilo. Poskusite lahko spremeniti imenike.
Returns the most frequent value in a group.
Syntax
from pyspark.sql import functions as sf
sf.mode(col, deterministic=False)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or column name |
Target column to compute on. |
deterministic |
bool, optional | If there are multiple equally-frequent results then return the lowest (defaults to false). |
Returns
pyspark.sql.Column: the most frequent value in a group.
Examples
from pyspark.sql import functions as sf
df = spark.createDataFrame([
("Java", 2012, 20000), ("dotNET", 2012, 5000),
("Java", 2012, 20000), ("dotNET", 2012, 5000),
("dotNET", 2013, 48000), ("Java", 2013, 30000)],
schema=("course", "year", "earnings"))
df.groupby("course").agg(sf.mode("year")).sort("course").show()
+------+----------+
|course|mode(year)|
+------+----------+
| Java| 2012|
|dotNET| 2012|
+------+----------+
When multiple values have the same greatest frequency then either any of values is returned if deterministic is false or is not defined, or the lowest value is returned if deterministic is true.
from pyspark.sql import functions as sf
df = spark.createDataFrame([(-10,), (0,), (10,)], ["col"])
df.select(sf.mode("col", True)).show()
+---------------------------------------+
|mode() WITHIN GROUP (ORDER BY col DESC)|
+---------------------------------------+
| -10|
+---------------------------------------+