Del via


tuple_sketch_agg_double aggregate function

Applies to: check marked yes Databricks Runtime 18.1 and above

Creates a Datasketches TupleSketch from key-value pairs where keys are used for distinct counting and double summary values are aggregated according to the specified mode.

Syntax

tuple_sketch_agg_double ( key, summary [, lgNomEntries [, mode ]] )

Arguments

  • key: The expression for unique value counting. Accepted types are INTEGER, LONG, FLOAT, DOUBLE, STRING, BINARY, ARRAY<INTEGER>, and ARRAY<LONG>.
  • summary: A DOUBLE value to be associated with and aggregated for each key.
  • lgNomEntries: An optional INTEGER literal specifying the log-base-2 of nominal entries. Must be between 4 and 26, inclusive. The default is 12 (4,096 buckets). Higher values provide better accuracy but use more memory.
  • mode: An optional STRING literal specifying the aggregation mode for summaries. Valid values: 'sum', 'min', 'max', 'alwaysone'. The default is 'sum'.

Returns

A BINARY value containing the serialized compact TupleSketch with double summaries.

Notes

  • NULL key or summary values are ignored during aggregation.
  • Empty strings, empty byte arrays, and empty arrays are ignored for keys.
  • The lgNomEntries and mode parameters must be constant values.
  • Use tuple_sketch_estimate_double to obtain the distinct count estimate.
  • Use tuple_sketch_summary_double to obtain the aggregated summary value.

Error messages

Examples

-- Create sketch with sum mode (default)
> SELECT tuple_sketch_estimate_double(tuple_sketch_agg_double(key, summary, 12, 'sum')) FROM VALUES (1, 5.0D), (1, 1.0D), (2, 2.0D), (2, 3.0D), (3, 2.2D) tab(key, summary);
3.0

-- Get aggregated summary
> SELECT tuple_sketch_summary_double(tuple_sketch_agg_double(key, summary)) FROM VALUES (1, 1.0D), (1, 2.0D), (2, 3.0D) tab(key, summary);
6.0