Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
Applies to:
Databricks Runtime 18.1 and above
Creates a Datasketches TupleSketch from key-value pairs where keys are used for distinct counting and double summary values are aggregated according to the specified mode.
Syntax
tuple_sketch_agg_double ( key, summary [, lgNomEntries [, mode ]] )
Arguments
- key: The expression for unique value counting. Accepted types are
INTEGER,LONG,FLOAT,DOUBLE,STRING,BINARY,ARRAY<INTEGER>, andARRAY<LONG>. - summary: A
DOUBLEvalue to be associated with and aggregated for each key. - lgNomEntries: An optional
INTEGERliteral specifying the log-base-2 of nominal entries. Must be between 4 and 26, inclusive. The default is 12 (4,096 buckets). Higher values provide better accuracy but use more memory. - mode: An optional
STRINGliteral specifying the aggregation mode for summaries. Valid values:'sum','min','max','alwaysone'. The default is'sum'.
Returns
A BINARY value containing the serialized compact TupleSketch with double summaries.
Notes
NULLkey or summary values are ignored during aggregation.- Empty strings, empty byte arrays, and empty arrays are ignored for keys.
- The
lgNomEntriesandmodeparameters must be constant values. - Use tuple_sketch_estimate_double to obtain the distinct count estimate.
- Use tuple_sketch_summary_double to obtain the aggregated summary value.
Error messages
Examples
-- Create sketch with sum mode (default)
> SELECT tuple_sketch_estimate_double(tuple_sketch_agg_double(key, summary, 12, 'sum')) FROM VALUES (1, 5.0D), (1, 1.0D), (2, 2.0D), (2, 3.0D), (3, 2.2D) tab(key, summary);
3.0
-- Get aggregated summary
> SELECT tuple_sketch_summary_double(tuple_sketch_agg_double(key, summary)) FROM VALUES (1, 1.0D), (1, 2.0D), (2, 3.0D) tab(key, summary);
6.0