הערה
הגישה לדף זה מחייבת הרשאה. באפשרותך לנסות להיכנס או לשנות מדריכי כתובות.
הגישה לדף זה מחייבת הרשאה. באפשרותך לנסות לשנות מדריכי כתובות.
Returns the exact percentile(s) of numeric column expr at the given percentage(s) with value range in [0.0, 1.0].
Syntax
from pyspark.sql import functions as sf
sf.percentile(col, percentage, frequency=1)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or str |
The numeric column. |
percentage |
pyspark.sql.Column, float, list of floats or tuple of floats |
Percentage in decimal (must be between 0.0 and 1.0). |
frequency |
pyspark.sql.Column or int |
A positive numeric literal which controls frequency (default: 1). |
Returns
pyspark.sql.Column: the exact percentile of the numeric column.
Examples
Example 1: Calculate multiple percentiles
from pyspark.sql import functions as sf
key = (sf.col("id") % 3).alias("key")
value = (sf.randn(42) + key * 10).alias("value")
df = spark.range(0, 1000, 1, 1).select(key, value)
df.select(
sf.percentile("value", [0.25, 0.5, 0.75], sf.lit(1))
).show(truncate=False)
+--------------------------------------------------------+
|percentile(value, array(0.25, 0.5, 0.75), 1) |
+--------------------------------------------------------+
|[0.7441991494121..., 9.9900713756..., 19.33740203080...]|
+--------------------------------------------------------+
Example 2: Calculate percentile by group
from pyspark.sql import functions as sf
key = (sf.col("id") % 3).alias("key")
value = (sf.randn(42) + key * 10).alias("value")
df = spark.range(0, 1000, 1, 1).select(key, value)
df.groupBy("key").agg(
sf.percentile("value", sf.lit(0.5), sf.lit(1))
).sort("key").show()
+---+-------------------------+
|key|percentile(value, 0.5, 1)|
+---+-------------------------+
| 0| -0.03449962216667...|
| 1| 9.990389751837...|
| 2| 19.967859769284...|
+---+-------------------------+