Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Creates a new array column from the input columns or column names.
Syntax
from pyspark.sql import functions as sf
sf.array(*cols)
Parameters
| Parameter | Type | Description |
|---|---|---|
cols |
pyspark.sql.Column or str |
Column names or Column objects that have the same data type. |
Returns
pyspark.sql.Column: A new Column of array type, where each value is an array containing the corresponding values from the input columns.
Examples
Example 1: Basic usage of array function with column names.
from pyspark.sql import functions as sf
df = spark.createDataFrame([("Alice", "doctor"), ("Bob", "engineer")],
("name", "occupation"))
df.select(sf.array('name', 'occupation')).show()
+-----------------------+
|array(name, occupation)|
+-----------------------+
| [Alice, doctor]|
| [Bob, engineer]|
+-----------------------+
Example 2: Usage of array function with Column objects.
from pyspark.sql import functions as sf
df = spark.createDataFrame([("Alice", "doctor"), ("Bob", "engineer")],
("name", "occupation"))
df.select(sf.array(df.name, df.occupation)).show()
+-----------------------+
|array(name, occupation)|
+-----------------------+
| [Alice, doctor]|
| [Bob, engineer]|
+-----------------------+
Example 3: Single argument as list of column names.
from pyspark.sql import functions as sf
df = spark.createDataFrame([("Alice", "doctor"), ("Bob", "engineer")],
("name", "occupation"))
df.select(sf.array(['name', 'occupation'])).show()
+-----------------------+
|array(name, occupation)|
+-----------------------+
| [Alice, doctor]|
| [Bob, engineer]|
+-----------------------+
Example 4: Usage of array function with columns of different types.
from pyspark.sql import functions as sf
df = spark.createDataFrame(
[("Alice", 2, 22.2), ("Bob", 5, 36.1)],
("name", "age", "weight"))
df.select(sf.array(['age', 'weight'])).show()
+------------------+
|array(age, weight)|
+------------------+
| [2.0, 22.2]|
| [5.0, 36.1]|
+------------------+
Example 5: array function with a column containing null values.
from pyspark.sql import functions as sf
df = spark.createDataFrame([("Alice", None), ("Bob", "engineer")],
("name", "occupation"))
df.select(sf.array('name', 'occupation')).show()
+-----------------------+
|array(name, occupation)|
+-----------------------+
| [Alice, NULL]|
| [Bob, engineer]|
+-----------------------+