Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Merges two binary representations of Datasketches ThetaSketch objects, using a Datasketches Union object.
Syntax
from pyspark.sql import functions as sf
sf.theta_union(col1, col2, lgNomEntries=None)
Parameters
| Parameter | Type | Description |
|---|---|---|
col1 |
pyspark.sql.Column or str |
The first Theta sketch. |
col2 |
pyspark.sql.Column or str |
The second Theta sketch. |
lgNomEntries |
pyspark.sql.Column or int, optional |
The log-base-2 of nominal entries for the union operation (must be between 4 and 26, defaults to 12). |
Returns
pyspark.sql.Column: The binary representation of the merged ThetaSketch.
Examples
Example 1: Union two Theta sketches
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1,4),(2,5),(2,5),(3,6)], "struct<v1:int,v2:int>")
df = df.agg(
sf.theta_sketch_agg("v1").alias("sketch1"),
sf.theta_sketch_agg("v2").alias("sketch2")
)
df.select(sf.theta_sketch_estimate(sf.theta_union(df.sketch1, "sketch2"))).show()
+--------------------------------------------------------+
|theta_sketch_estimate(theta_union(sketch1, sketch2, 12))|
+--------------------------------------------------------+
| 6|
+--------------------------------------------------------+