Kopīgot, izmantojot


string_agg

Aggregate function: returns the concatenation of non-null input values, separated by the delimiter. An alias of listagg.

Syntax

from pyspark.sql import functions as sf

sf.string_agg(col, delimiter=None)

Parameters

Parameter Type Description
col pyspark.sql.Column or str Target column to compute on.
delimiter pyspark.sql.Column, str or bytes, optional The delimiter to separate the values. The default value is None.

Returns

pyspark.sql.Column: the column for computed results.

Examples

Example 1: Using string_agg function

from pyspark.sql import functions as sf
df = spark.createDataFrame([('a',), ('b',), (None,), ('c',)], ['strings'])
df.select(sf.string_agg('strings')).show()
+-------------------------+
|string_agg(strings, NULL)|
+-------------------------+
|                      abc|
+-------------------------+

Example 2: Using string_agg function with a delimiter

from pyspark.sql import functions as sf
df = spark.createDataFrame([('a',), ('b',), (None,), ('c',)], ['strings'])
df.select(sf.string_agg('strings', ', ')).show()
+-----------------------+
|string_agg(strings, , )|
+-----------------------+
|                a, b, c|
+-----------------------+

Example 3: Using string_agg function with a binary column and delimiter

from pyspark.sql import functions as sf
df = spark.createDataFrame([(b'\x01',), (b'\x02',), (None,), (b'\x03',)], ['bytes'])
df.select(sf.string_agg('bytes', b'\x42')).show()
+------------------------+
|string_agg(bytes, X'42')|
+------------------------+
|        [01 42 02 42 03]|
+------------------------+