sortWithinPartitions

Returns a new DataFrame with each partition sorted by the specified column(s).

Syntax

sortWithinPartitions(*cols: Union[int, str, Column, List[Union[int, str, Column]]], **kwargs: Any)

Parameters

Parameter	Type	Description
`cols`	int, str, list or Column, optional	list of Column or column names or column ordinals to sort by.
`ascending`	bool or list, optional, default True	boolean or list of boolean. Sort ascending vs. descending. Specify list for multiple sort orders. If a list is specified, the length of the list must equal the length of the `cols`.

Returns

DataFrame: DataFrame sorted by partitions.

Notes

A column ordinal starts from 1, which is different from the 0-based __getitem__. If a column ordinal is negative, it means sort descending.

Examples

from pyspark.sql import functions as sf
df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", "name"])
df.sortWithinPartitions("age", ascending=False)
# DataFrame[age: bigint, name: string]

df.coalesce(1).sortWithinPartitions(1).show()
# +---+-----+
# |age| name|
# +---+-----+
# |  2|Alice|
# |  5|  Bob|
# +---+-----+

df.coalesce(1).sortWithinPartitions(-1).show()
# +---+-----+
# |age| name|
# +---+-----+
# |  5|  Bob|
# |  2|Alice|
# +---+-----+

Phản hồi

Trang này có hữu ích không?

Last updated on 2026-04-17