Share via


theta_difference function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 18.0 and above

Computes the set difference (A minus B) of two Theta Sketch binary representations. The returned sketch contains only values that appear in the first sketch but not in the second.

Syntax

theta_difference ( first, second )

Arguments

  • first: A Theta Sketch in binary format (set A).
  • second: A Theta Sketch in binary format (set B).

Returns

A BINARY value containing the serialized Theta Sketch representing the set difference (A - B).

Notes

  • The operation is not commutative: theta_difference(A, B)theta_difference(B, A).
  • The result contains values that appear in the first sketch but not in the second.

Error messages

Examples

-- Find values in first sketch but not in second
> SELECT theta_sketch_estimate(theta_difference(theta_sketch_agg(col1), theta_sketch_agg(col2)))
  FROM VALUES (5, 4), (1, 4), (2, 5), (2, 5), (3, 1) tab(col1, col2);
2