Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Returns the number of TRUE values for the column.
Syntax
from pyspark.sql import functions as sf
sf.count_if(col)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or column name |
Target column to work on. |
Returns
pyspark.sql.Column: the number of TRUE values for the col.
Examples
Example 1: Counting the number of even numbers in a numeric column
from pyspark.sql import functions as sf
df = spark.createDataFrame([("a", 1), ("a", 2), ("a", 3), ("b", 8), ("b", 2)], ["c1", "c2"])
df.select(sf.count_if(sf.col('c2') % 2 == 0)).show()
+------------------------+
|count_if(((c2 % 2) = 0))|
+------------------------+
| 3|
+------------------------+
Example 2: Counting the number of rows where a string column starts with a certain letter
from pyspark.sql import functions as sf
df = spark.createDataFrame(
[("apple",), ("banana",), ("cherry",), ("apple",), ("banana",)], ["fruit"])
df.select(sf.count_if(sf.col('fruit').startswith('a'))).show()
+------------------------------+
|count_if(startswith(fruit, a))|
+------------------------------+
| 2|
+------------------------------+
Example 3: Counting the number of rows where a numeric column is greater than a certain value
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1,), (2,), (3,), (4,), (5,)], ["num"])
df.select(sf.count_if(sf.col('num') > 3)).show()
+-------------------+
|count_if((num > 3))|
+-------------------+
| 2|
+-------------------+
Example 4: Counting the number of rows where a boolean column is True
from pyspark.sql import functions as sf
df = spark.createDataFrame([(True,), (False,), (True,), (False,), (True,)], ["b"])
df.select(sf.count('b'), sf.count_if('b')).show()
+--------+-----------+
|count(b)|count_if(b)|
+--------+-----------+
| 5| 3|
+--------+-----------+