Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Returns a boolean indicating whether the array contains the given value. Returns null if the array is null, true if the array contains the given value, and false otherwise.
Syntax
from pyspark.sql import functions as sf
sf.array_contains(col, value)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or str |
The target column containing the arrays. |
value |
Any | The value or column to check for in the array. |
Returns
pyspark.sql.Column: A new Column of Boolean type, where each value indicates whether the corresponding array from the input column contains the specified value.
Examples
Example 1: Basic usage of array_contains function.
from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data'])
df.select(sf.array_contains(df.data, "a")).show()
+-----------------------+
|array_contains(data, a)|
+-----------------------+
| true|
| false|
+-----------------------+
Example 2: Usage of array_contains function with a column.
from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", "b", "c"], "c"),
(["c", "d", "e"], "d"),
(["e", "a", "c"], "b")], ["data", "item"])
df.select(sf.array_contains(df.data, sf.col("item"))).show()
+--------------------------+
|array_contains(data, item)|
+--------------------------+
| true|
| true|
| false|
+--------------------------+
Example 3: Attempt to use array_contains function with a null array.
from pyspark.sql import functions as sf
df = spark.createDataFrame([(None,), (["a", "b", "c"],)], ['data'])
df.select(sf.array_contains(df.data, "a")).show()
+-----------------------+
|array_contains(data, a)|
+-----------------------+
| NULL|
| true|
+-----------------------+
Example 4: Usage of array_contains with an array column containing null values.
from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", None, "c"],)], ['data'])
df.select(sf.array_contains(df.data, "a")).show()
+-----------------------+
|array_contains(data, a)|
+-----------------------+
| true|
+-----------------------+