Share via


approx_percentile

Returns the approximate percentile of the numeric column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value.

Syntax

from pyspark.sql import functions as sf

sf.approx_percentile(col, percentage, accuracy=10000)

Parameters

Parameter Type Description
col pyspark.sql.Column or str Input column.
percentage pyspark.sql.Column, float, list of floats or tuple of floats Percentage in decimal (must be between 0.0 and 1.0). When percentage is an array, each value must be between 0.0 and 1.0.
accuracy pyspark.sql.Column or int A positive numeric literal which controls approximation accuracy at the cost of memory. Higher value yields better accuracy. 1.0/accuracy is the relative error (default: 10000).

Returns

pyspark.sql.Column: approximate percentile of the numeric column.

Examples

Example 1: Calculate approximate percentiles

from pyspark.sql import functions as sf
key = (sf.col("id") % 3).alias("key")
value = (sf.randn(42) + key * 10).alias("value")
df = spark.range(0, 1000, 1, 1).select(key, value)
df.select(
    sf.approx_percentile("value", [0.25, 0.5, 0.75], 1000000)
).show(truncate=False)
+----------------------------------------------------------+
|approx_percentile(value, array(0.25, 0.5, 0.75), 1000000) |
+----------------------------------------------------------+
|[0.7264430125286..., 9.98975299938..., 19.335304783039...]|
+----------------------------------------------------------+

Example 2: Calculate approximate percentile by group

from pyspark.sql import functions as sf
key = (sf.col("id") % 3).alias("key")
value = (sf.randn(42) + key * 10).alias("value")
df = spark.range(0, 1000, 1, 1).select(key, value)
df.groupBy("key").agg(
    sf.approx_percentile("value", sf.lit(0.5), sf.lit(1000000))
).sort("key").show()
+---+--------------------------------------+
|key|approx_percentile(value, 0.5, 1000000)|
+---+--------------------------------------+
|  0|                  -0.03519435193070...|
|  1|                     9.990389751837...|
|  2|                    19.967859769284...|
+---+--------------------------------------+