Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays. If one of the arrays is shorter than others then the resulting struct type value will be a null for missing elements.
Syntax
from pyspark.sql import functions as sf
sf.arrays_zip(*cols)
Parameters
| Parameter | Type | Description |
|---|---|---|
cols |
pyspark.sql.Column or str |
Columns of arrays to be merged. |
Returns
pyspark.sql.Column: Merged array of entries.
Examples
Example 1: Zipping two arrays of the same length
from pyspark.sql import functions as sf
df = spark.createDataFrame([([1, 2, 3], ['a', 'b', 'c'])], ['nums', 'letters'])
df.select(sf.arrays_zip(df.nums, df.letters)).show(truncate=False)
+-------------------------+
|arrays_zip(nums, letters)|
+-------------------------+
|[{1, a}, {2, b}, {3, c}] |
+-------------------------+
Example 2: Zipping arrays of different lengths
from pyspark.sql import functions as sf
df = spark.createDataFrame([([1, 2], ['a', 'b', 'c'])], ['nums', 'letters'])
df.select(sf.arrays_zip(df.nums, df.letters)).show(truncate=False)
+---------------------------+
|arrays_zip(nums, letters) |
+---------------------------+
|[{1, a}, {2, b}, {NULL, c}]|
+---------------------------+
Example 3: Zipping more than two arrays
from pyspark.sql import functions as sf
df = spark.createDataFrame(
[([1, 2], ['a', 'b'], [True, False])], ['nums', 'letters', 'bools'])
df.select(sf.arrays_zip(df.nums, df.letters, df.bools)).show(truncate=False)
+--------------------------------+
|arrays_zip(nums, letters, bools)|
+--------------------------------+
|[{1, a, true}, {2, b, false}] |
+--------------------------------+
Example 4: Zipping arrays with null values
from pyspark.sql import functions as sf
df = spark.createDataFrame([([1, 2, None], ['a', None, 'c'])], ['nums', 'letters'])
df.select(sf.arrays_zip(df.nums, df.letters)).show(truncate=False)
+------------------------------+
|arrays_zip(nums, letters) |
+------------------------------+
|[{1, a}, {2, NULL}, {NULL, c}]|
+------------------------------+