partitioning.bucket

A transform for any type that partitions by a hash of the input column.

Note

This function can be used only in combination with DataFrameWriterV2.partitionedBy method.

Syntax

from pyspark.sql.functions import partitioning

partitioning.bucket(numBuckets, col)

Parameter	Type	Description
`numBuckets`	`pyspark.sql.Column` or int	The number of buckets.
`col`	`pyspark.sql.Column` or str	Target date or timestamp column to work on.

from pyspark.sql.functions import partitioning
df.writeTo("catalog.db.table").partitionedBy(
    partitioning.bucket(42, "ts")
).createOrReplace()

Was this page helpful?