Nota
Capaian ke halaman ini memerlukan kebenaran. Anda boleh cuba mendaftar masuk atau menukar direktori.
Capaian ke halaman ini memerlukan kebenaran. Anda boleh cuba menukar direktori.
Sorts the output in each bucket by the given columns on the file system.
Syntax
sortBy(col, *cols)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
str, tuple, or list | A column name, or a list of names. |
*cols |
str, optional | Additional column names. Must be empty if col is a list. |
Returns
DataFrameWriter
Examples
Write a DataFrame into a sorted-bucketed table, and read it back.
spark.sql("DROP TABLE IF EXISTS sorted_bucketed_table")
spark.createDataFrame([
(100, "Alice"), (120, "Alice"), (140, "Bob")],
schema=["age", "name"]
).write.bucketBy(1, "name").sortBy("age").mode(
"overwrite").saveAsTable("sorted_bucketed_table")
spark.read.table("sorted_bucketed_table").sort("age").show()
# +---+------------+
# |age| name|
# +---+------------+
# |100|Alice|
# |120|Alice|
# |140| Bob|
# +---+------------+
spark.sql("DROP TABLE sorted_bucketed_table")