Note
Kailangan ng pahintulot para ma-access ang page na ito. Maaari mong subukang mag-sign in o magpalit ng mga direktoryo.
Ang pag-access sa pahinang ito ay nangangailangan ng pahintulot. Maaari mong subukang baguhin ang mga direktoryo.
Partitions the output by the given columns on the file system. If specified, the output is laid out on the file system similar to Hive's partitioning scheme.
Syntax
partitionBy(*cols)
Parameters
| Parameter | Type | Description |
|---|---|---|
*cols |
str or list | Names of the columns to partition by. |
Returns
DataFrameWriter
Examples
Write a DataFrame into a Parquet file in a partitioned manner, and read it back.
import tempfile, os
with tempfile.TemporaryDirectory(prefix="partitionBy") as d:
spark.createDataFrame(
[{"age": 100, "name": "Alice"}, {"age": 120, "name": "Ruifeng Zheng"}]
).write.partitionBy("name").mode("overwrite").format("parquet").save(d)
spark.read.parquet(d).sort("age").show()
# +---+-------------+
# |age| name|
# +---+-------------+
# |100| Alice|
# |120|Ruifeng Zheng|
# +---+-------------+
# Read one partition as a DataFrame.
spark.read.parquet(f"{d}{os.path.sep}name=Alice").show()
# +---+
# |age|
# +---+
# |100|
# +---+