Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is unset or set to false.
Syntax
from pyspark.sql import functions as sf
sf.hll_union_agg(col, allowDifferentLgConfigK=None)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or str |
The column containing HLL sketches to merge. |
allowDifferentLgConfigK |
pyspark.sql.Column or bool, optional |
Allow sketches with different lgConfigK values to be merged (defaults to false). |
Returns
pyspark.sql.Column: The binary representation of the merged HllSketch.
Examples
Example 1: Merge HLL sketches with default settings
from pyspark.sql import functions as sf
df1 = spark.createDataFrame([1,2,2,3], "INT")
df1 = df1.agg(sf.hll_sketch_agg("value").alias("sketch"))
df2 = spark.createDataFrame([4,5,5,6], "INT")
df2 = df2.agg(sf.hll_sketch_agg("value").alias("sketch"))
df3 = df1.union(df2)
df3.agg(sf.hll_sketch_estimate(sf.hll_union_agg("sketch"))).show()
+-------------------------------------------------+
|hll_sketch_estimate(hll_union_agg(sketch, false))|
+-------------------------------------------------+
| 6|
+-------------------------------------------------+
Example 2: Merge HLL sketches with explicit allowDifferentLgConfigK
from pyspark.sql import functions as sf
df1 = spark.createDataFrame([1,2,2,3], "INT")
df1 = df1.agg(sf.hll_sketch_agg("value").alias("sketch"))
df2 = spark.createDataFrame([4,5,5,6], "INT")
df2 = df2.agg(sf.hll_sketch_agg("value").alias("sketch"))
df3 = df1.union(df2)
df3.agg(sf.hll_sketch_estimate(sf.hll_union_agg("sketch", False))).show()
+-------------------------------------------------+
|hll_sketch_estimate(hll_union_agg(sketch, false))|
+-------------------------------------------------+
| 6|
+-------------------------------------------------+