Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to:
Databricks SQL
Databricks Runtime 18.0 and above
Consumes multiple Theta Sketch buffers and merges them using set union into one result buffer. Use this function to combine sketches from different partitions or time periods.
Syntax
theta_union_agg ( sketch [, lgNomEntries ] )
Arguments
- sketch: A Theta Sketch in binary format (such as from
theta_sketch_aggaggregate function). - lgNomEntries: An optional
INTEGERliteral specifying the log-base-2 of the nominal entries for the union buffer. Must be between 4 and 26, inclusive. The default is 12. Higher values provide better accuracy but use more memory.
Returns
A BINARY value containing the merged serialized Theta Sketch representing the union of all input sketches.
Notes
- The union operation handles input sketches with different
lgNomEntriesvalues. NULLvalues are ignored during aggregation.- To merge exactly two sketches, use the scalar
theta_unionfunction function instead.
Error messages
Examples
-- Merge sketches from different groups
> SELECT theta_sketch_estimate(theta_union_agg(sketch)) FROM (
SELECT theta_sketch_agg(col) AS sketch FROM VALUES (1), (2), (3) AS tab(col)
UNION ALL
SELECT theta_sketch_agg(col) AS sketch FROM VALUES (3), (4), (5) AS tab(col)
) t;
5
-- Merge sketches with custom lgNomEntries
> SELECT theta_sketch_estimate(theta_union_agg(sketch, 15)) FROM (
SELECT theta_sketch_agg(col) AS sketch FROM VALUES (1), (2) AS tab(col)
UNION ALL
SELECT theta_sketch_agg(col, 20) AS sketch FROM VALUES (2), (3) AS tab(col)
) t;
3