Nóta
Aðgangur að þessari síðu krefst heimildar. Þú getur prófað aðskrá þig inn eða breyta skráasöfnum.
Aðgangur að þessari síðu krefst heimildar. Þú getur prófað að breyta skráasöfnum.
Returns the exact percentile(s) of numeric column expr at the given percentage(s) with value range in [0.0, 1.0].
Syntax
from pyspark.sql import functions as sf
sf.percentile(col, percentage, frequency=1)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or str |
The numeric column. |
percentage |
pyspark.sql.Column, float, list of floats or tuple of floats |
Percentage in decimal (must be between 0.0 and 1.0). |
frequency |
pyspark.sql.Column or int |
A positive numeric literal which controls frequency (default: 1). |
Returns
pyspark.sql.Column: the exact percentile of the numeric column.
Examples
Example 1: Calculate multiple percentiles
from pyspark.sql import functions as sf
key = (sf.col("id") % 3).alias("key")
value = (sf.randn(42) + key * 10).alias("value")
df = spark.range(0, 1000, 1, 1).select(key, value)
df.select(
sf.percentile("value", [0.25, 0.5, 0.75], sf.lit(1))
).show(truncate=False)
+--------------------------------------------------------+
|percentile(value, array(0.25, 0.5, 0.75), 1) |
+--------------------------------------------------------+
|[0.7441991494121..., 9.9900713756..., 19.33740203080...]|
+--------------------------------------------------------+
Example 2: Calculate percentile by group
from pyspark.sql import functions as sf
key = (sf.col("id") % 3).alias("key")
value = (sf.randn(42) + key * 10).alias("value")
df = spark.range(0, 1000, 1, 1).select(key, value)
df.groupBy("key").agg(
sf.percentile("value", sf.lit(0.5), sf.lit(1))
).sort("key").show()
+---+-------------------------+
|key|percentile(value, 0.5, 1)|
+---+-------------------------+
| 0| -0.03449962216667...|
| 1| 9.990389751837...|
| 2| 19.967859769284...|
+---+-------------------------+