Share via


countDistinct

Returns a new Column for distinct count of col or cols. Supports Spark Connect.

An alias of count_distinct, and it is encouraged to use count_distinct directly.

Syntax

from pyspark.databricks.sql import functions as dbf

dbf.countDistinct(col=<col>, *cols)

Parameters

Parameter Type Description
col pyspark.sql.Column or column name First column to compute on.
cols pyspark.sql.Column or column name Other columns to compute on.

Examples

from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([(1,), (1,), (3,)], ["value"])
df.select(dbf.count_distinct(df.value)).show()
+---------------------+
|count(DISTINCT value)|
+---------------------+
|                    2|
+---------------------+
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([(1,), (1,), (3,)], ["value"])
df.select(dbf.countDistinct(df.value)).show()
+---------------------+
|count(DISTINCT value)|
+---------------------+
|                    2|
+---------------------+