Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.
Syntax
from pyspark.sql import functions as sf
sf.cume_dist()
Parameters
This function does not take any parameters.
Returns
pyspark.sql.Column: the column for calculating cumulative distribution.
Examples
from pyspark.sql import functions as sf
from pyspark.sql import Window
df = spark.createDataFrame([1, 2, 3, 3, 4], "int")
w = Window.orderBy("value")
df.withColumn("cd", sf.cume_dist().over(w)).show()
+-----+---+
|value| cd|
+-----+---+
| 1|0.2|
| 2|0.4|
| 3|0.8|
| 3|0.8|
| 4|1.0|
+-----+---+