Share via


arrays_overlap

Returns a boolean column indicating if the input arrays have common non-null elements. Returns true if they do, null if the arrays do not contain any common elements but are not empty and at least one of them contains a null element, and false otherwise.

Syntax

from pyspark.sql import functions as sf

sf.arrays_overlap(a1, a2)

Parameters

Parameter Type Description
a1 pyspark.sql.Column or str The name of the column that contains the first array.
a2 pyspark.sql.Column or str The name of the column that contains the second array.

Returns

pyspark.sql.Column: A new Column of Boolean type, where each value indicates whether the corresponding arrays from the input columns contain any common elements.

Examples

Example 1: Basic usage of arrays_overlap function.

from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", "b"], ["b", "c"]), (["a"], ["b", "c"])], ['x', 'y'])
df.select(sf.arrays_overlap(df.x, df.y)).show()
+--------------------+
|arrays_overlap(x, y)|
+--------------------+
|                true|
|               false|
+--------------------+

Example 2: Usage of arrays_overlap function with arrays containing null elements.

from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", None], ["b", None]), (["a"], ["b", "c"])], ['x', 'y'])
df.select(sf.arrays_overlap(df.x, df.y)).show()
+--------------------+
|arrays_overlap(x, y)|
+--------------------+
|                NULL|
|               false|
+--------------------+

Example 3: Usage of arrays_overlap function with arrays that are null.

from pyspark.sql import functions as sf
df = spark.createDataFrame([(None, ["b", "c"]), (["a"], None)], ['x', 'y'])
df.select(sf.arrays_overlap(df.x, df.y)).show()
+--------------------+
|arrays_overlap(x, y)|
+--------------------+
|                NULL|
|                NULL|
+--------------------+

Example 4: Usage of arrays_overlap on arrays with identical elements.

from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", "b"], ["a", "b"]), (["a"], ["a"])], ['x', 'y'])
df.select(sf.arrays_overlap(df.x, df.y)).show()
+--------------------+
|arrays_overlap(x, y)|
+--------------------+
|                true|
|                true|
+--------------------+