bucket

Partition transform function: A transform for any type that partitions by a hash of the input column. Supports Spark Connect.

Warning

Deprecated in 4.0.0. Use partitioning.bucket instead.

Syntax

from pyspark.databricks.sql import functions as dbf

dbf.bucket(numBuckets=<numBuckets>, col=<col>)

Parameter	Type	Description
`numBuckets`	`pyspark.sql.Column` or `int`	The number of buckets.
`col`	`pyspark.sql.Column` or `str`	Target date or timestamp column to work on.

pyspark.sql.Column: Data partitioned by given columns.

df.writeTo("catalog.db.table").partitionedBy(
    bucket(42, "ts")
).createOrReplace()

Note

This function can be used only in combination with the partitionedBy method of the DataFrameWriterV2.

Was this page helpful?