Jaa


tuple_sketch_agg_integer aggregate function

Applies to: check marked yes Databricks Runtime 18.1 and above

Creates a Datasketches TupleSketch from key-value pairs where keys are used for distinct counting and integer summary values are aggregated according to the specified mode.

Syntax

tuple_sketch_agg_integer ( key, summary [, lgNomEntries [, mode ]] )

Arguments

  • key: The expression for unique value counting. Accepted types are INTEGER, LONG, FLOAT, DOUBLE, STRING, BINARY, ARRAY<INTEGER>, and ARRAY<LONG>.
  • summary: An INTEGER value to be associated with and aggregated for each key.
  • lgNomEntries: An optional INTEGER literal specifying the log-base-2 of nominal entries. Must be between 4 and 26, inclusive. The default is 12 (4,096 buckets). Higher values provide better accuracy but use more memory.
  • mode: An optional STRING literal specifying the aggregation mode for summaries. Valid values: 'sum', 'min', 'max', 'alwaysone'. The default is 'sum'.

Returns

A BINARY value containing the serialized compact TupleSketch with integer summaries.

Notes

  • NULL key or summary values are ignored during aggregation.
  • Empty strings, empty byte arrays, and empty arrays are ignored for keys.
  • The lgNomEntries and mode parameters must be constant values.
  • Use tuple_sketch_estimate_integer to obtain the distinct count estimate.
  • Use tuple_sketch_summary_integer to obtain the aggregated summary value.

Error messages

Examples

-- Create sketch and get distinct count estimate
> SELECT tuple_sketch_estimate_integer(tuple_sketch_agg_integer(key, summary)) FROM VALUES (1, 5), (1, 1), (2, 2), (2, 3), (3, 2) tab(key, summary);
3.0

-- Get aggregated summary (sum mode by default)
> SELECT tuple_sketch_summary_integer(tuple_sketch_agg_integer(key, summary)) FROM VALUES (1, 1), (1, 2), (2, 3) tab(key, summary);
6