opomba,
Dostop do te strani zahteva pooblastilo. Poskusite se vpisati alispremeniti imenike.
Dostop do te strani zahteva pooblastilo. Poskusite lahko spremeniti imenike.
Applies to:
Databricks Runtime 18.1 and above
Creates a Datasketches TupleSketch from key-value pairs where keys are used for distinct counting and integer summary values are aggregated according to the specified mode.
Syntax
tuple_sketch_agg_integer ( key, summary [, lgNomEntries [, mode ]] )
Arguments
- key: The expression for unique value counting. Accepted types are
INTEGER,LONG,FLOAT,DOUBLE,STRING,BINARY,ARRAY<INTEGER>, andARRAY<LONG>. - summary: An
INTEGERvalue to be associated with and aggregated for each key. - lgNomEntries: An optional
INTEGERliteral specifying the log-base-2 of nominal entries. Must be between 4 and 26, inclusive. The default is 12 (4,096 buckets). Higher values provide better accuracy but use more memory. - mode: An optional
STRINGliteral specifying the aggregation mode for summaries. Valid values:'sum','min','max','alwaysone'. The default is'sum'.
Returns
A BINARY value containing the serialized compact TupleSketch with integer summaries.
Notes
NULLkey or summary values are ignored during aggregation.- Empty strings, empty byte arrays, and empty arrays are ignored for keys.
- The
lgNomEntriesandmodeparameters must be constant values. - Use tuple_sketch_estimate_integer to obtain the distinct count estimate.
- Use tuple_sketch_summary_integer to obtain the aggregated summary value.
Error messages
Examples
-- Create sketch and get distinct count estimate
> SELECT tuple_sketch_estimate_integer(tuple_sketch_agg_integer(key, summary)) FROM VALUES (1, 5), (1, 1), (2, 2), (2, 3), (3, 2) tab(key, summary);
3.0
-- Get aggregated summary (sum mode by default)
> SELECT tuple_sketch_summary_integer(tuple_sketch_agg_integer(key, summary)) FROM VALUES (1, 1), (1, 2), (2, 3) tab(key, summary);
6