Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
Partitions the output by the given columns on the file system. The output is laid out similar to Hive's partitioning scheme.
Syntax
partitionBy(*cols)
Parameters
| Parameter | Type | Description |
|---|---|---|
*cols |
str or list | Names of the columns to partition by. |
Returns
DataStreamWriter
Examples
df = spark.readStream.format("rate").load()
df.writeStream.partitionBy("value")
# <...streaming.readwriter.DataStreamWriter object ...>
Partition a Rate source stream by timestamp and write to Parquet:
import tempfile
import time
with tempfile.TemporaryDirectory(prefix="partitionBy1") as d:
with tempfile.TemporaryDirectory(prefix="partitionBy2") as cp:
df = spark.readStream.format("rate").option("rowsPerSecond", 10).load()
q = df.writeStream.partitionBy(
"timestamp").format("parquet").option("checkpointLocation", cp).start(d)
time.sleep(5)
q.stop()
spark.read.schema(df.schema).parquet(d).show()