Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Remove all elements that equal to element from the given array.
Syntax
from pyspark.sql import functions as sf
sf.array_remove(col, element)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or str |
Name of column containing array |
element |
Any | Element or a Column expression to be removed from the array |
Returns
pyspark.sql.Column: A new column that is an array excluding the given value from the input column.
Examples
Example 1: Removing a specific value from a simple array
from pyspark.sql import functions as sf
df = spark.createDataFrame([([1, 2, 3, 1, 1],)], ['data'])
df.select(sf.array_remove(df.data, 1)).show()
+---------------------+
|array_remove(data, 1)|
+---------------------+
| [2, 3]|
+---------------------+
Example 2: Removing a specific value from multiple arrays
from pyspark.sql import functions as sf
df = spark.createDataFrame([([1, 2, 3, 1, 1],), ([4, 5, 5, 4],)], ['data'])
df.select(sf.array_remove(df.data, 5)).show()
+---------------------+
|array_remove(data, 5)|
+---------------------+
| [1, 2, 3, 1, 1]|
| [4, 4]|
+---------------------+
Example 3: Removing a value that does not exist in the array
from pyspark.sql import functions as sf
df = spark.createDataFrame([([1, 2, 3],)], ['data'])
df.select(sf.array_remove(df.data, 4)).show()
+---------------------+
|array_remove(data, 4)|
+---------------------+
| [1, 2, 3]|
+---------------------+
Example 4: Removing a value from an array with all identical values
from pyspark.sql import functions as sf
df = spark.createDataFrame([([1, 1, 1],)], ['data'])
df.select(sf.array_remove(df.data, 1)).show()
+---------------------+
|array_remove(data, 1)|
+---------------------+
| []|
+---------------------+
Example 5: Removing a value from an empty array
from pyspark.sql import functions as sf
from pyspark.sql.types import ArrayType, IntegerType, StructType, StructField
schema = StructType([
StructField("data", ArrayType(IntegerType()), True)
])
df = spark.createDataFrame([([],)], schema)
df.select(sf.array_remove(df.data, 1)).show()
+---------------------+
|array_remove(data, 1)|
+---------------------+
| []|
+---------------------+
Example 6: Removing a column's value from a simple array
from pyspark.sql import functions as sf
df = spark.createDataFrame([([1, 2, 3, 1, 1], 1)], ['data', 'col'])
df.select(sf.array_remove(df.data, df.col)).show()
+-----------------------+
|array_remove(data, col)|
+-----------------------+
| [2, 3]|
+-----------------------+