zstd_compress

Returns a compressed value of expr using Zstandard with the specified compression level. The default level is 3. Uses single-pass mode by default.

Syntax

from pyspark.sql import functions as dbf

dbf.zstd_compress(input=<input>, level=<level>, streaming_mode=<streaming_mode>)

Parameters

Parameter	Type	Description
`input`	`pyspark.sql.Column` or `str`	The binary value to compress.
`level`	`pyspark.sql.Column` or `int`, optional	Optional integer argument that represents the compression level. The compression level controls the trade-off between compression speed and compression ratio. Valid values: between 1 and 22 inclusive, where 1 means fastest but lowest compression ratio, and 22 means slowest but highest compression ratio. The default level is 3 if not specified.
`streaming_mode`	`pyspark.sql.Column` or `bool`, optional	Optional boolean argument that represents whether to use streaming mode. If true, the function will compress in streaming mode. The default value is false.

Returns

pyspark.sql.Column: A new column that contains a compressed value.

Examples

Example 1: Compress data using Zstandard

from pyspark.sql import functions as dbf
df = spark.createDataFrame([("Apache Spark " * 10,)], ["input"])
df.select(dbf.base64(dbf.zstd_compress(df.input)).alias("result")).show(truncate=False)

+----------------------------------------+
|result                                  |
+----------------------------------------+
|KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU=|
+----------------------------------------+

Example 2: Compress data using Zstandard with given compression level

from pyspark.sql import functions as dbf
df = spark.createDataFrame([("Apache Spark " * 10,)], ["input"])
df.select(dbf.base64(dbf.zstd_compress(df.input, dbf.lit(5))).alias("result")).show(truncate=False)

+----------------------------------------+
|result                                  |
+----------------------------------------+
|KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU=|
+----------------------------------------+

Example 3: Compress data using Zstandard in streaming mode

from pyspark.sql import functions as dbf
df = spark.createDataFrame([("Apache Spark " * 10,)], ["input"])
df.select(dbf.base64(dbf.zstd_compress(df.input, dbf.lit(3), dbf.lit(True))).alias("result")).show(truncate=False)

+--------------------------------------------+
|result                                      |
+--------------------------------------------+
|KLUv/QBYpAAAaEFwYWNoZSBTcGFyayABABLS+QUBAAA=|
+--------------------------------------------+

Phản hồi

Trang này có hữu ích không?

Last updated on 2026-04-27