重新劃分

回傳一個新的 DataFrame，並依給定的分割表達式劃分。所得資料幀會被雜湊分割。

語法

repartition(numPartitions: Union[int, "ColumnOrName"], *cols: "ColumnOrName")

參數

參數	類型	說明
`numPartitions`	int	可以是用來指定目標分割數的整數，或是欄位。如果是欄位，則會作為第一個分割欄位使用。若未指定，則使用預設的分割區數。
`cols`	str 或 Column	欄位分割。

退貨

DataFrame：重新分割後的資料幀。

Examples

from pyspark.sql import functions as sf
df = spark.range(0, 64, 1, 9).withColumn(
    "name", sf.concat(sf.lit("name_"), sf.col("id").cast("string"))
).withColumn(
    "age", sf.col("id") - 32
)
df.repartition(10).select(
    sf.spark_partition_id().alias("partition")
).distinct().sort("partition").show()
# +---------+
# |partition|
# +---------+
# |        0|
# ...
# |        9|
# +---------+

df.repartition(7, "age").select(
    sf.spark_partition_id().alias("partition")
).distinct().sort("partition").show()
# +---------+
# |partition|
# +---------+
# |        0|
# ...
# |        6|
# +---------+

意見反應

此頁面對您有幫助嗎？

Last updated on 2026-04-19

重新劃分

語法

參數

退貨

Examples

意見反應

其他資源