Share via


explode

Returns a new row for each element in the given array or map. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise.

Note

Only one explode is allowed per SELECT clause.

Syntax

from pyspark.sql import functions as sf

sf.explode(col)

Parameters

Parameter Type Description
col pyspark.sql.Column or column name Target column to work on.

Returns

pyspark.sql.Column: One row per array item or map key value.

Examples

Example 1: Exploding an array column

from pyspark.sql import functions as sf
df = spark.sql('SELECT * FROM VALUES (1,ARRAY(1,2,3,NULL)), (2,ARRAY()), (3,NULL) AS t(i,a)')
df.show()
+---+---------------+
|  i|              a|
+---+---------------+
|  1|[1, 2, 3, NULL]|
|  2|             []|
|  3|           NULL|
+---+---------------+
df.select('*', sf.explode('a')).show()
+---+---------------+----+
|  i|              a| col|
+---+---------------+----+
|  1|[1, 2, 3, NULL]|   1|
|  1|[1, 2, 3, NULL]|   2|
|  1|[1, 2, 3, NULL]|   3|
|  1|[1, 2, 3, NULL]|NULL|
+---+---------------+----+

Example 2: Exploding a map column

from pyspark.sql import functions as sf
df = spark.sql('SELECT * FROM VALUES (1,MAP(1,2,3,4,5,NULL)), (2,MAP()), (3,NULL) AS t(i,m)')
df.show(truncate=False)
+---+---------------------------+
|i  |m                          |
+---+---------------------------+
|1  |{1 -> 2, 3 -> 4, 5 -> NULL}|
|2  |{}                         |
|3  |NULL                       |
+---+---------------------------+
df.select('*', sf.explode('m')).show(truncate=False)
+---+---------------------------+---+-----+
|i  |m                          |key|value|
+---+---------------------------+---+-----+
|1  |{1 -> 2, 3 -> 4, 5 -> NULL}|1  |2    |
|1  |{1 -> 2, 3 -> 4, 5 -> NULL}|3  |4    |
|1  |{1 -> 2, 3 -> 4, 5 -> NULL}|5  |NULL |
+---+---------------------------+---+-----+

Example 3: Exploding multiple array columns

import pyspark.sql.functions as sf
df = spark.sql('SELECT ARRAY(1,2) AS a1, ARRAY(3,4,5) AS a2')
df.select(
    '*', sf.explode('a1').alias('v1')
).select('*', sf.explode('a2').alias('v2')).show()
+------+---------+---+---+
|    a1|       a2| v1| v2|
+------+---------+---+---+
|[1, 2]|[3, 4, 5]|  1|  3|
|[1, 2]|[3, 4, 5]|  1|  4|
|[1, 2]|[3, 4, 5]|  1|  5|
|[1, 2]|[3, 4, 5]|  2|  3|
|[1, 2]|[3, 4, 5]|  2|  4|
|[1, 2]|[3, 4, 5]|  2|  5|
+------+---------+---+---+

Example 4: Exploding an array of struct column

import pyspark.sql.functions as sf
df = spark.sql('SELECT ARRAY(NAMED_STRUCT("a",1,"b",2), NAMED_STRUCT("a",3,"b",4)) AS a')
df.select(sf.explode('a').alias("s")).select("s.*").show()
+---+---+
|  a|  b|
+---+---+
|  1|  2|
|  3|  4|
+---+---+