Nóta
Aðgangur að þessari síðu krefst heimildar. Þú getur prófað aðskrá þig inn eða breyta skráasöfnum.
Aðgangur að þessari síðu krefst heimildar. Þú getur prófað að breyta skráasöfnum.
Collects the values from a column into a list, maintaining duplicates, and returns this list of objects. The function is non-deterministic as the order of collected results depends on the order of the rows, which possibly becomes non-deterministic after shuffle operations.
Syntax
from pyspark.sql import functions as sf
sf.collect_list(col)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or column name |
The target column on which the function is computed. |
Returns
pyspark.sql.Column: A new Column object representing a list of collected values, with duplicate values included.
Examples
Example 1: Collect values from a DataFrame and sort the result in ascending order
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1,), (2,), (2,)], ('value',))
df.select(sf.sort_array(sf.collect_list('value')).alias('sorted_list')).show()
+-----------+
|sorted_list|
+-----------+
| [1, 2, 2]|
+-----------+
Example 2: Collect values from a DataFrame and sort the result in descending order
from pyspark.sql import functions as sf
df = spark.createDataFrame([(2,), (5,), (5,)], ('age',))
df.select(sf.sort_array(sf.collect_list('age'), asc=False).alias('sorted_list')).show()
+-----------+
|sorted_list|
+-----------+
| [5, 5, 2]|
+-----------+
Example 3: Collect values from a DataFrame with multiple columns and sort the result
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1, "John"), (2, "John"), (3, "Ana")], ("id", "name"))
df = df.groupBy("name").agg(sf.sort_array(sf.collect_list('id')).alias('sorted_list'))
df.orderBy(sf.desc("name")).show()
+----+-----------+
|name|sorted_list|
+----+-----------+
|John| [1, 2]|
| Ana| [3]|
+----+-----------+