Nota
Capaian ke halaman ini memerlukan kebenaran. Anda boleh cuba mendaftar masuk atau menukar direktori.
Capaian ke halaman ini memerlukan kebenaran. Anda boleh cuba menukar direktori.
Clusters the data by the given columns to optimize query performance.
Syntax
clusterBy(*cols)
Parameters
| Parameter | Type | Description |
|---|---|---|
*cols |
str or list | Names of the columns to cluster by. |
Returns
DataFrameWriter
Examples
Write a DataFrame into a Parquet file with clustering.
import tempfile
with tempfile.TemporaryDirectory(prefix="clusterBy") as d:
spark.createDataFrame(
[{"age": 100, "name": "Alice"}, {"age": 120, "name": "Ruifeng Zheng"}]
).write.clusterBy("name").mode("overwrite").format("parquet").save(d)