Share via


grouping_id

Aggregate function: returns the level of grouping, equals to (grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn).

Syntax

from pyspark.sql import functions as sf

sf.grouping_id(*cols)

Parameters

Parameter Type Description
cols pyspark.sql.Column or str Columns to check for. The list of columns should match with grouping columns exactly, or empty (means all the grouping columns).

Returns

pyspark.sql.Column: returns level of the grouping it relates to.

Examples

Example 1: Get grouping ID in cube operation

from pyspark.sql import functions as sf
df = spark.createDataFrame(
    [(1, "a", "a"), (3, "a", "a"), (4, "b", "c")], ["c1", "c2", "c3"])
df.cube("c2", "c3").agg(sf.grouping_id(), sf.sum("c1")).orderBy("c2", "c3").show()
+----+----+-------------+-------+
|  c2|  c3|grouping_id()|sum(c1)|
+----+----+-------------+-------+
|NULL|NULL|            3|      8|
|NULL|   a|            2|      4|
|NULL|   c|            2|      4|
|   a|NULL|            1|      4|
|   a|   a|            0|      4|
|   b|NULL|            1|      4|
|   b|   c|            0|      4|
+----+----+-------------+-------+