Nóta
Aðgangur að þessari síðu krefst heimildar. Þú getur prófað aðskrá þig inn eða breyta skráasöfnum.
Aðgangur að þessari síðu krefst heimildar. Þú getur prófað að breyta skráasöfnum.
Merges two binary representations of Datasketches Theta Sketch objects, using a Datasketches Union object.
Syntax
from pyspark.sql import functions as sf
sf.theta_union(col1, col2, lgNomEntries=None)
Parameters
| Parameter | Type | Description |
|---|---|---|
col1 |
pyspark.sql.Column or str |
The first Theta sketch. |
col2 |
pyspark.sql.Column or str |
The second Theta sketch. |
lgNomEntries |
pyspark.sql.Column or int, optional |
The log-base-2 of nominal entries for the union operation (must be between 4 and 26, defaults to 12). |
Returns
pyspark.sql.Column: The binary representation of the merged Theta Sketch.
Examples
Example 1: Union two Theta sketches
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1,4),(2,5),(2,5),(3,6)], "struct<v1:int,v2:int>")
df = df.agg(
sf.theta_sketch_agg("v1").alias("sketch1"),
sf.theta_sketch_agg("v2").alias("sketch2")
)
df.select(sf.theta_sketch_estimate(sf.theta_union(df.sketch1, "sketch2"))).show()
+--------------------------------------------------------+
|theta_sketch_estimate(theta_union(sketch1, sketch2, 12))|
+--------------------------------------------------------+
| 6|
+--------------------------------------------------------+