Note
ამ გვერდზე წვდომა ავტორიზაციას მოითხოვს. შეგიძლიათ სცადოთ შესვლა ან დირექტორიების შეცვლა.
ამ გვერდზე წვდომა ავტორიზაციას მოითხოვს. შეგიძლიათ სცადოთ დირექტორიების შეცვლა.
Groups the DataFrame by the specified columns so that aggregation can be performed on them. See GroupedData for all the available aggregate functions.
Syntax
groupBy(*cols: "ColumnOrNameOrOrdinal")
Parameters
| Parameter | Type | Description |
|---|---|---|
cols |
list, str, int or Column | The columns to group by. Each element can be a column name (string) or an expression (Column) or a column ordinal (int, 1-based) or list of them. |
Returns
GroupedData: A GroupedData object representing the grouped data by the specified columns.
Notes
A column ordinal starts from 1, which is different from the 0-based __getitem__.
Examples
df = spark.createDataFrame([
("Alice", 2), ("Bob", 2), ("Bob", 2), ("Bob", 5)], schema=["name", "age"])
df.groupBy().avg().show()
# +--------+
# |avg(age)|
# +--------+
# | 2.75|
# +--------+
df.groupBy("name").agg({"age": "sum"}).sort("name").show()
# +-----+--------+
# | name|sum(age)|
# +-----+--------+
# |Alice| 2|
# | Bob| 9|
# +-----+--------+
df.groupBy(df.name).max().sort("name").show()
# +-----+--------+
# | name|max(age)|
# +-----+--------+
# |Alice| 2|
# | Bob| 5|
# +-----+--------+
df.groupBy(["name", df.age]).count().sort("name", "age").show()
# +-----+---+-----+
# | name|age|count|
# +-----+---+-----+
# |Alice| 2| 1|
# | Bob| 2| 2|
# | Bob| 5| 1|
# +-----+---+-----+