opomba,
Dostop do te strani zahteva pooblastilo. Poskusite se vpisati alispremeniti imenike.
Dostop do te strani zahteva pooblastilo. Poskusite lahko spremeniti imenike.
Returns the set difference of two binary representations of Datasketches ThetaSketch objects (elements in first sketch but not in second), using a Datasketches ANotB object.
Syntax
from pyspark.sql import functions as sf
sf.theta_difference(col1, col2)
Parameters
| Parameter | Type | Description |
|---|---|---|
col1 |
pyspark.sql.Column or str |
The first Theta sketch. |
col2 |
pyspark.sql.Column or str |
The second Theta sketch. |
Returns
pyspark.sql.Column: The binary representation of the difference ThetaSketch.
Examples
Example 1: Get difference of two Theta sketches
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1,4),(2,4),(3,5),(4,5)], "struct<v1:int,v2:int>")
df = df.agg(
sf.theta_sketch_agg("v1").alias("sketch1"),
sf.theta_sketch_agg("v2").alias("sketch2")
)
df.select(sf.theta_sketch_estimate(sf.theta_difference(df.sketch1, "sketch2"))).show()
+---------------------------------------------------------+
|theta_sketch_estimate(theta_difference(sketch1, sketch2))|
+---------------------------------------------------------+
| 3|
+---------------------------------------------------------+