Kopīgot, izmantojot


flatten

Creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed.

Syntax

from pyspark.sql import functions as sf

sf.flatten(col)

Parameters

Parameter Type Description
col pyspark.sql.Column or str The name of the column or expression to be flattened.

Returns

pyspark.sql.Column: A new column that contains the flattened array.

Examples

Example 1: Flattening a simple nested array

from pyspark.sql import functions as sf
df = spark.createDataFrame([([[1, 2, 3], [4, 5], [6]],)], ['data'])
df.select(sf.flatten(df.data)).show()
+------------------+
|     flatten(data)|
+------------------+
|[1, 2, 3, 4, 5, 6]|
+------------------+

Example 2: Flattening an array with null values

from pyspark.sql import functions as sf
df = spark.createDataFrame([([None, [4, 5]],)], ['data'])
df.select(sf.flatten(df.data)).show()
+-------------+
|flatten(data)|
+-------------+
|         NULL|
+-------------+

Example 3: Flattening an array with more than two levels of nesting

from pyspark.sql import functions as sf
df = spark.createDataFrame([([[[1, 2], [3, 4]], [[5, 6], [7, 8]]],)], ['data'])
df.select(sf.flatten(df.data)).show(truncate=False)
+--------------------------------+
|flatten(data)                   |
+--------------------------------+
|[[1, 2], [3, 4], [5, 6], [7, 8]]|
+--------------------------------+

Example 4: Flattening an array with mixed types

from pyspark.sql import functions as sf
df = spark.createDataFrame([([['a', 'b', 'c'], [1, 2, 3]],)], ['data'])
df.select(sf.flatten(df.data)).show()
+------------------+
|     flatten(data)|
+------------------+
|[a, b, c, 1, 2, 3]|
+------------------+