Kopīgot, izmantojot


array_contains

Returns a boolean indicating whether the array contains the given value. Returns null if the array is null, true if the array contains the given value, and false otherwise.

Syntax

from pyspark.sql import functions as sf

sf.array_contains(col, value)

Parameters

Parameter Type Description
col pyspark.sql.Column or str The target column containing the arrays.
value Any The value or column to check for in the array.

Returns

pyspark.sql.Column: A new Column of Boolean type, where each value indicates whether the corresponding array from the input column contains the specified value.

Examples

Example 1: Basic usage of array_contains function.

from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data'])
df.select(sf.array_contains(df.data, "a")).show()
+-----------------------+
|array_contains(data, a)|
+-----------------------+
|                   true|
|                  false|
+-----------------------+

Example 2: Usage of array_contains function with a column.

from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", "b", "c"], "c"),
                           (["c", "d", "e"], "d"),
                           (["e", "a", "c"], "b")], ["data", "item"])
df.select(sf.array_contains(df.data, sf.col("item"))).show()
+--------------------------+
|array_contains(data, item)|
+--------------------------+
|                      true|
|                      true|
|                     false|
+--------------------------+

Example 3: Attempt to use array_contains function with a null array.

from pyspark.sql import functions as sf
df = spark.createDataFrame([(None,), (["a", "b", "c"],)], ['data'])
df.select(sf.array_contains(df.data, "a")).show()
+-----------------------+
|array_contains(data, a)|
+-----------------------+
|                   NULL|
|                   true|
+-----------------------+

Example 4: Usage of array_contains with an array column containing null values.

from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", None, "c"],)], ['data'])
df.select(sf.array_contains(df.data, "a")).show()
+-----------------------+
|array_contains(data, a)|
+-----------------------+
|                   true|
+-----------------------+