Lưu ý
Cần có ủy quyền mới truy nhập được vào trang này. Bạn có thể thử đăng nhập hoặc thay đổi thư mục.
Cần có ủy quyền mới truy nhập được vào trang này. Bạn có thể thử thay đổi thư mục.
Applies to:
Databricks Runtime 18.1 and above
Creates a Datasketches TupleSketch from key-value pairs where keys are used for distinct counting and double summary values are aggregated according to the specified mode.
Syntax
tuple_sketch_agg_double ( key, summary [, lgNomEntries [, mode ]] )
Arguments
- key: The expression for unique value counting. Accepted types are
INTEGER,LONG,FLOAT,DOUBLE,STRING,BINARY,ARRAY<INTEGER>, andARRAY<LONG>. - summary: A
DOUBLEvalue to be associated with and aggregated for each key. - lgNomEntries: An optional
INTEGERliteral specifying the log-base-2 of nominal entries. Must be between 4 and 26, inclusive. The default is 12 (4,096 buckets). Higher values provide better accuracy but use more memory. - mode: An optional
STRINGliteral specifying the aggregation mode for summaries. Valid values:'sum','min','max','alwaysone'. The default is'sum'.
Returns
A BINARY value containing the serialized compact TupleSketch with double summaries.
Notes
NULLkey or summary values are ignored during aggregation.- Empty strings, empty byte arrays, and empty arrays are ignored for keys.
- The
lgNomEntriesandmodeparameters must be constant values. - Use tuple_sketch_estimate_double to obtain the distinct count estimate.
- Use tuple_sketch_summary_double to obtain the aggregated summary value.
Common error conditions
Examples
-- Create sketch with sum mode (default)
> SELECT tuple_sketch_estimate_double(tuple_sketch_agg_double(key, summary, 12, 'sum')) FROM VALUES (1, 5.0D), (1, 1.0D), (2, 2.0D), (2, 3.0D), (3, 2.2D) tab(key, summary);
3.0
-- Get aggregated summary
> SELECT tuple_sketch_summary_double(tuple_sketch_agg_double(key, summary)) FROM VALUES (1, 1.0D), (1, 2.0D), (2, 3.0D) tab(key, summary);
6.0