Share via


hll_union_agg

Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is unset or set to false.

Syntax

from pyspark.sql import functions as sf

sf.hll_union_agg(col, allowDifferentLgConfigK=None)

Parameters

Parameter Type Description
col pyspark.sql.Column or str The column containing HLL sketches to merge.
allowDifferentLgConfigK pyspark.sql.Column or bool, optional Allow sketches with different lgConfigK values to be merged (defaults to false).

Returns

pyspark.sql.Column: The binary representation of the merged HllSketch.

Examples

Example 1: Merge HLL sketches with default settings

from pyspark.sql import functions as sf
df1 = spark.createDataFrame([1,2,2,3], "INT")
df1 = df1.agg(sf.hll_sketch_agg("value").alias("sketch"))
df2 = spark.createDataFrame([4,5,5,6], "INT")
df2 = df2.agg(sf.hll_sketch_agg("value").alias("sketch"))
df3 = df1.union(df2)
df3.agg(sf.hll_sketch_estimate(sf.hll_union_agg("sketch"))).show()
+-------------------------------------------------+
|hll_sketch_estimate(hll_union_agg(sketch, false))|
+-------------------------------------------------+
|                                                6|
+-------------------------------------------------+

Example 2: Merge HLL sketches with explicit allowDifferentLgConfigK

from pyspark.sql import functions as sf
df1 = spark.createDataFrame([1,2,2,3], "INT")
df1 = df1.agg(sf.hll_sketch_agg("value").alias("sketch"))
df2 = spark.createDataFrame([4,5,5,6], "INT")
df2 = df2.agg(sf.hll_sketch_agg("value").alias("sketch"))
df3 = df1.union(df2)
df3.agg(sf.hll_sketch_estimate(sf.hll_union_agg("sketch", False))).show()
+-------------------------------------------------+
|hll_sketch_estimate(hll_union_agg(sketch, false))|
+-------------------------------------------------+
|                                                6|
+-------------------------------------------------+