Napomena
Za pristup ovoj stranici potrebna je autorizacija. Možete se pokušati prijaviti ili promijeniti direktorije.
Za pristup ovoj stranici potrebna je autorizacija. Možete pokušati promijeniti direktorije.
Returns the set difference of two binary representations of Datasketches Theta Sketch objects (elements in first sketch but not in second), using a Datasketches ANotB object.
Syntax
from pyspark.sql import functions as sf
sf.theta_difference(col1, col2)
Parameters
| Parameter | Type | Description |
|---|---|---|
col1 |
pyspark.sql.Column or str |
The first Theta sketch. |
col2 |
pyspark.sql.Column or str |
The second Theta sketch. |
Returns
pyspark.sql.Column: The binary representation of the difference Theta Sketch.
Examples
Example 1: Get difference of two Theta sketches
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1,4),(2,4),(3,5),(4,5)], "struct<v1:int,v2:int>")
df = df.agg(
sf.theta_sketch_agg("v1").alias("sketch1"),
sf.theta_sketch_agg("v2").alias("sketch2")
)
df.select(sf.theta_sketch_estimate(sf.theta_difference(df.sketch1, "sketch2"))).show()
+---------------------------------------------------------+
|theta_sketch_estimate(theta_difference(sketch1, sketch2))|
+---------------------------------------------------------+
| 3|
+---------------------------------------------------------+