Share via


array_sort

Collection function: Sorts the input array in ascending order. The elements of the input array must be orderable. Null elements will be placed at the end of the returned array. Supports Spark Connect.

For the corresponding Databricks SQL function, see array_sort function.

Syntax

from pyspark.databricks.sql import functions as dbf

dbf.array_sort(col=<col>, comparator=<comparator>)

Parameters

Parameter Type Description
col pyspark.sql.Column or str Name of column or expression.
comparator callable, optional A binary function that returns a negative integer, 0, or a positive integer as the first element is less than, equal to, or greater than the second element. If the comparator function returns null, the function will fail and raise an error.

Returns

pyspark.sql.Column: sorted array.

Examples

Example 1: Sorting an array in default ascending order

from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([([2, 1, None, 3],),([1],),([],)], ['data'])
df.select(dbf.array_sort(df.data).alias('r')).collect()
[Row(r=[1, 2, 3, None]), Row(r=[1]), Row(r=[])]

Example 2: Sorting an array with a custom comparator

from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([(["foo", "foobar", None, "bar"],),(["foo"],),([],)], ['data'])
df.select(dbf.array_sort(
    "data",
    lambda x, y: dbf.when(x.isNull() | y.isNull(), dbf.lit(0)).otherwise(dbf.length(y) - dbf.length(x))
).alias("r")).collect()
[Row(r=['foobar', 'foo', None, 'bar']), Row(r=['foo']), Row(r=[])]