Share via


zstd_compress

Returns a compressed value of expr using Zstandard with the specified compression level. The default level is 3. Uses single-pass mode by default.

Syntax

from pyspark.databricks.sql import functions as dbf

dbf.zstd_compress(input=<input>, level=<level>, streaming_mode=<streaming_mode>)

Parameters

Parameter Type Description
input pyspark.sql.Column or str The binary value to compress.
level pyspark.sql.Column or int, optional Optional integer argument that represents the compression level. The compression level controls the trade-off between compression speed and compression ratio. Valid values: between 1 and 22 inclusive, where 1 means fastest but lowest compression ratio, and 22 means slowest but highest compression ratio. The default level is 3 if not specified.
streaming_mode pyspark.sql.Column or bool, optional Optional boolean argument that represents whether to use streaming mode. If true, the function will compress in streaming mode. The default value is false.

Returns

pyspark.sql.Column: A new column that contains a compressed value.

Examples

Example 1: Compress data using Zstandard

from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([("Apache Spark " * 10,)], ["input"])
df.select(dbf.base64(dbf.zstd_compress(df.input)).alias("result")).show(truncate=False)
+----------------------------------------+
|result                                  |
+----------------------------------------+
|KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU=|
+----------------------------------------+

Example 2: Compress data using Zstandard with given compression level

from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([("Apache Spark " * 10,)], ["input"])
df.select(dbf.base64(dbf.zstd_compress(df.input, dbf.lit(5))).alias("result")).show(truncate=False)
+----------------------------------------+
|result                                  |
+----------------------------------------+
|KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU=|
+----------------------------------------+

Example 3: Compress data using Zstandard in streaming mode

from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([("Apache Spark " * 10,)], ["input"])
df.select(dbf.base64(dbf.zstd_compress(df.input, dbf.lit(3), dbf.lit(True))).alias("result")).show(truncate=False)
+--------------------------------------------+
|result                                      |
+--------------------------------------------+
|KLUv/QBYpAAAaEFwYWNoZSBTcGFyayABABLS+QUBAAA=|
+--------------------------------------------+