Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is unset or set to false.
Syntax
from pyspark.sql import functions as sf
sf.hll_union(col1, col2, allowDifferentLgConfigK=None)
Parameters
| Parameter | Type | Description |
|---|---|---|
col1 |
pyspark.sql.Column or str |
The first HLL sketch. |
col2 |
pyspark.sql.Column or str |
The second HLL sketch. |
allowDifferentLgConfigK |
bool, optional | Allow sketches with different lgConfigK values to be merged (defaults to false). |
Returns
pyspark.sql.Column: The binary representation of the merged HllSketch.
Examples
Example 1: Union two HLL sketches
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1,4),(2,5),(2,5),(3,6)], "struct<v1:int,v2:int>")
df = df.agg(
sf.hll_sketch_agg("v1").alias("sketch1"),
sf.hll_sketch_agg("v2").alias("sketch2")
)
df.select(sf.hll_sketch_estimate(sf.hll_union(df.sketch1, "sketch2"))).show()
+-------------------------------------------------------+
|hll_sketch_estimate(hll_union(sketch1, sketch2, false))|
+-------------------------------------------------------+
| 6|
+-------------------------------------------------------+