Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Separates col1, ..., colk into n rows. Uses column names col0, col1, etc. by default unless specified otherwise.
Syntax
from pyspark.sql import functions as sf
sf.stack(*cols)
Parameters
| Parameter | Type | Description |
|---|---|---|
cols |
pyspark.sql.Column or column name |
The first element should be a literal int for the number of rows to be separated, and the remaining are input elements to be separated. |
Examples
Example 1: Stack with 2 rows
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1, 2, 3)], ['a', 'b', 'c'])
df.select('*', sf.stack(sf.lit(2), df.a, df.b, 'c')).show()
+---+---+---+----+----+
| a| b| c|col0|col1|
+---+---+---+----+----+
| 1| 2| 3| 1| 2|
| 1| 2| 3| 3|NULL|
+---+---+---+----+----+
Example 2: Stack with alias
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1, 2, 3)], ['a', 'b', 'c'])
df.select('*', sf.stack(sf.lit(2), df.a, df.b, 'c').alias('x', 'y')).show()
+---+---+---+---+----+
| a| b| c| x| y|
+---+---+---+---+----+
| 1| 2| 3| 1| 2|
| 1| 2| 3| 3|NULL|
+---+---+---+---+----+
Example 3: Stack with 3 rows
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1, 2, 3)], ['a', 'b', 'c'])
df.select('*', sf.stack(sf.lit(3), df.a, df.b, 'c')).show()
+---+---+---+----+
| a| b| c|col0|
+---+---+---+----+
| 1| 2| 3| 1|
| 1| 2| 3| 2|
| 1| 2| 3| 3|
+---+---+---+----+
Example 4: Stack with 4 rows
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1, 2, 3)], ['a', 'b', 'c'])
df.select('*', sf.stack(sf.lit(4), df.a, df.b, 'c')).show()
+---+---+---+----+
| a| b| c|col0|
+---+---+---+----+
| 1| 2| 3| 1|
| 1| 2| 3| 2|
| 1| 2| 3| 3|
| 1| 2| 3|NULL|
+---+---+---+----+