partitionBy (DataFrameWriter)

Partitions the output by the given columns on the file system. If specified, the output is laid out on the file system similar to Hive's partitioning scheme.

Syntax

partitionBy(*cols)

Parameters

Parameter	Type	Description
`*cols`	str or list	Names of the columns to partition by.

Returns

DataFrameWriter

Examples

Write a DataFrame into a Parquet file in a partitioned manner, and read it back.

import tempfile, os
with tempfile.TemporaryDirectory(prefix="partitionBy") as d:
    spark.createDataFrame(
        [{"age": 100, "name": "Alice"}, {"age": 120, "name": "Ruifeng Zheng"}]
    ).write.partitionBy("name").mode("overwrite").format("parquet").save(d)

    spark.read.parquet(d).sort("age").show()
    # +---+-------------+
    # |age|         name|
    # +---+-------------+
    # |100| Alice|
    # |120|Ruifeng Zheng|
    # +---+-------------+

    # Read one partition as a DataFrame.
    spark.read.parquet(f"{d}{os.path.sep}name=Alice").show()
    # +---+
    # |age|
    # +---+
    # |100|
    # +---+

Phản hồi

Trang này có hữu ích không?

Last updated on 2026-04-17