नोट
इस पेज तक पहुँच के लिए प्रमाणन की आवश्यकता होती है. आप साइन इन करने या निर्देशिकाओं को बदलने का प्रयास कर सकते हैं.
इस पेज तक पहुँच के लिए प्रमाणन की आवश्यकता होती है. आप निर्देशिकाओं को बदलने का प्रयास कर सकते हैं.
Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function. Supports Spark Connect.
For the corresponding Databricks SQL function, see aggregate function.
Syntax
from pyspark.databricks.sql import functions as dbf
dbf.aggregate(col=<col>, initialValue=<initialValue>, merge=<merge>, finish=<finish>)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or str |
Name of column or expression. |
initialValue |
pyspark.sql.Column or str |
Initial value. Name of column or expression. |
merge |
function |
A binary function that returns expression of the same type as initialValue. |
finish |
function, optional |
An optional unary function used to convert accumulated value. |
Returns
pyspark.sql.Column: final value after aggregate function is applied.
Examples
Example 1: Simple aggregation with sum
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([(1, [20.0, 4.0, 2.0, 6.0, 10.0])], ("id", "values"))
df.select(dbf.aggregate("values", dbf.lit(0.0), lambda acc, x: acc + x).alias("sum")).show()
+----+
| sum|
+----+
|42.0|
+----+
Example 2: Aggregation with finish function
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([(1, [20.0, 4.0, 2.0, 6.0, 10.0])], ("id", "values"))
def merge(acc, x):
count = acc.count + 1
sum = acc.sum + x
return dbf.struct(count.alias("count"), sum.alias("sum"))
df.select(
dbf.aggregate(
"values",
dbf.struct(dbf.lit(0).alias("count"), dbf.lit(0.0).alias("sum")),
merge,
lambda acc: acc.sum / acc.count,
).alias("mean")
).show()
+----+
|mean|
+----+
| 8.4|
+----+