Lưu ý
Cần có ủy quyền mới truy nhập được vào trang này. Bạn có thể thử đăng nhập hoặc thay đổi thư mục.
Cần có ủy quyền mới truy nhập được vào trang này. Bạn có thể thử thay đổi thư mục.
A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy.
Supports Spark Connect
Syntax
DataFrame.groupBy(*cols)
Methods
| Method | Description |
|---|---|
agg(*exprs) |
Computes aggregates and returns the result as a DataFrame. Accepts a dictionary mapping column names to aggregate function names, or a list of aggregate Column expressions. |
avg(*cols) |
Computes average values for each numeric column for each group. mean is an alias. |
count() |
Counts the number of records for each group. |
max(*cols) |
Computes the max value for each numeric column for each group. |
mean(*cols) |
Computes average values for each numeric column for each group. avg is an alias. |
min(*cols) |
Computes the min value for each numeric column for each group. |
pivot(pivot_col, values) |
Pivots a column of the current DataFrame and performs the specified aggregation. |
sum(*cols) |
Computes the sum for each numeric column for each group. |
Examples
df = spark.createDataFrame(
[(2, "Alice"), (3, "Alice"), (5, "Bob"), (10, "Bob")], ["age", "name"])
df.groupBy("name").count().sort("name").show()
+-----+-----+
| name|count|
+-----+-----+
|Alice| 2|
| Bob| 2|
+-----+-----+
from pyspark.sql import functions as sf
df.groupBy("name").agg(sf.min("age")).sort("name").show()
+-----+--------+
| name|min(age)|
+-----+--------+
|Alice| 2|
| Bob| 5|
+-----+--------+
df.groupBy("name").avg("age").sort("name").show()
+-----+--------+
| name|avg(age)|
+-----+--------+
|Alice| 2.5|
| Bob| 7.5|
+-----+--------+
from pyspark.sql import Row
df1 = spark.createDataFrame([
Row(course="dotNET", year=2012, earnings=10000),
Row(course="Java", year=2012, earnings=20000),
Row(course="dotNET", year=2013, earnings=48000),
Row(course="Java", year=2013, earnings=30000),
])
df1.groupBy("year").pivot("course", ["dotNET", "Java"]).sum("earnings").sort("year").show()
+----+------+-----+
|year|dotNET| Java|
+----+------+-----+
|2012| 10000|20000|
|2013| 48000|30000|
+----+------+-----+