从输入列或列名称创建一个新的数组列。
Syntax
from pyspark.sql import functions as sf
sf.array(*cols)
参数
| 参数 | 类型 | Description |
|---|---|---|
cols |
pyspark.sql.Column 或 str |
具有相同数据类型的列名或 Column 对象。 |
退货
pyspark.sql.Column:数组类型的新列,其中每个值都是包含输入列中相应值的数组。
例子
示例 1:数组函数与列名称的基本用法。
from pyspark.sql import functions as sf
df = spark.createDataFrame([("Alice", "doctor"), ("Bob", "engineer")],
("name", "occupation"))
df.select(sf.array('name', 'occupation')).show()
+-----------------------+
|array(name, occupation)|
+-----------------------+
| [Alice, doctor]|
| [Bob, engineer]|
+-----------------------+
示例 2:数组函数与 Column 对象的用法。
from pyspark.sql import functions as sf
df = spark.createDataFrame([("Alice", "doctor"), ("Bob", "engineer")],
("name", "occupation"))
df.select(sf.array(df.name, df.occupation)).show()
+-----------------------+
|array(name, occupation)|
+-----------------------+
| [Alice, doctor]|
| [Bob, engineer]|
+-----------------------+
示例 3:单参数作为列名列表。
from pyspark.sql import functions as sf
df = spark.createDataFrame([("Alice", "doctor"), ("Bob", "engineer")],
("name", "occupation"))
df.select(sf.array(['name', 'occupation'])).show()
+-----------------------+
|array(name, occupation)|
+-----------------------+
| [Alice, doctor]|
| [Bob, engineer]|
+-----------------------+
示例 4:数组函数与不同类型的列的用法。
from pyspark.sql import functions as sf
df = spark.createDataFrame(
[("Alice", 2, 22.2), ("Bob", 5, 36.1)],
("name", "age", "weight"))
df.select(sf.array(['age', 'weight'])).show()
+------------------+
|array(age, weight)|
+------------------+
| [2.0, 22.2]|
| [5.0, 36.1]|
+------------------+
示例 5:包含 null 值的列的数组函数。
from pyspark.sql import functions as sf
df = spark.createDataFrame([("Alice", None), ("Bob", "engineer")],
("name", "occupation"))
df.select(sf.array('name', 'occupation')).show()
+-----------------------+
|array(name, occupation)|
+-----------------------+
| [Alice, NULL]|
| [Bob, engineer]|
+-----------------------+