Kopīgot, izmantojot


array_join

Returns a string column by concatenating the elements of the input array column using the delimiter. Null values within the array can be replaced with a specified string through the null_replacement argument. If null_replacement is not set, null values are ignored.

Syntax

from pyspark.sql import functions as sf

sf.array_join(col, delimiter, null_replacement=None)

Parameters

Parameter Type Description
col pyspark.sql.Column or str The input column containing the arrays to be joined.
delimiter str The string to be used as the delimiter when joining the array elements.
null_replacement str, optional The string to replace null values within the array. If not set, null values are ignored.

Returns

pyspark.sql.Column: A new column of string type, where each value is the result of joining the corresponding array from the input column.

Examples

Example 1: Basic usage of array_join function.

from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", "b", "c"],), (["a", "b"],)], ['data'])
df.select(sf.array_join(df.data, ",")).show()
+-------------------+
|array_join(data, ,)|
+-------------------+
|              a,b,c|
|                a,b|
+-------------------+

Example 2: Usage of array_join function with null_replacement argument.

from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", None, "c"],)], ['data'])
df.select(sf.array_join(df.data, ",", "NULL")).show()
+-------------------------+
|array_join(data, ,, NULL)|
+-------------------------+
|                 a,NULL,c|
+-------------------------+

Example 3: Usage of array_join function without null_replacement argument.

from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", None, "c"],)], ['data'])
df.select(sf.array_join(df.data, ",")).show()
+-------------------+
|array_join(data, ,)|
+-------------------+
|                a,c|
+-------------------+

Example 4: Usage of array_join function with an array that is null.

from pyspark.sql import functions as sf
from pyspark.sql.types import StructType, StructField, ArrayType, StringType
schema = StructType([StructField("data", ArrayType(StringType()), True)])
df = spark.createDataFrame([(None,)], schema)
df.select(sf.array_join(df.data, ",")).show()
+-------------------+
|array_join(data, ,)|
+-------------------+
|               NULL|
+-------------------+

Example 5: Usage of array_join function with an array containing only null values.

from pyspark.sql import functions as sf
from pyspark.sql.types import StructType, StructField, ArrayType, StringType
schema = StructType([StructField("data", ArrayType(StringType()), True)])
df = spark.createDataFrame([([None, None],)], schema)
df.select(sf.array_join(df.data, ",", "NULL")).show()
+-------------------------+
|array_join(data, ,, NULL)|
+-------------------------+
|                NULL,NULL|
+-------------------------+