Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Returns a new Column for distinct count of col or cols. Supports Spark Connect.
An alias of count_distinct, and it is encouraged to use count_distinct directly.
Syntax
from pyspark.databricks.sql import functions as dbf
dbf.countDistinct(col=<col>, *cols)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or column name |
First column to compute on. |
cols |
pyspark.sql.Column or column name |
Other columns to compute on. |
Examples
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([(1,), (1,), (3,)], ["value"])
df.select(dbf.count_distinct(df.value)).show()
+---------------------+
|count(DISTINCT value)|
+---------------------+
| 2|
+---------------------+
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([(1,), (1,), (3,)], ["value"])
df.select(dbf.countDistinct(df.value)).show()
+---------------------+
|count(DISTINCT value)|
+---------------------+
| 2|
+---------------------+