नोट
इस पेज तक पहुँच के लिए प्रमाणन की आवश्यकता होती है. आप साइन इन करने या निर्देशिकाओं को बदलने का प्रयास कर सकते हैं.
इस पेज तक पहुँच के लिए प्रमाणन की आवश्यकता होती है. आप निर्देशिकाओं को बदलने का प्रयास कर सकते हैं.
Applies to:
Databricks Runtime 18.1 and above
Creates a Datasketches TupleSketch from key-value pairs where keys are used for distinct counting and integer summary values are aggregated according to the specified mode.
Syntax
tuple_sketch_agg_integer ( key, summary [, lgNomEntries [, mode ]] )
Arguments
- key: The expression for unique value counting. Accepted types are
INTEGER,LONG,FLOAT,DOUBLE,STRING,BINARY,ARRAY<INTEGER>, andARRAY<LONG>. - summary: An
INTEGERvalue to be associated with and aggregated for each key. - lgNomEntries: An optional
INTEGERliteral specifying the log-base-2 of nominal entries. Must be between 4 and 26, inclusive. The default is 12 (4,096 buckets). Higher values provide better accuracy but use more memory. - mode: An optional
STRINGliteral specifying the aggregation mode for summaries. Valid values:'sum','min','max','alwaysone'. The default is'sum'.
Returns
A BINARY value containing the serialized compact TupleSketch with integer summaries.
Notes
NULLkey or summary values are ignored during aggregation.- Empty strings, empty byte arrays, and empty arrays are ignored for keys.
- The
lgNomEntriesandmodeparameters must be constant values. - Use tuple_sketch_estimate_integer to obtain the distinct count estimate.
- Use tuple_sketch_summary_integer to obtain the aggregated summary value.
Error messages
Examples
-- Create sketch and get distinct count estimate
> SELECT tuple_sketch_estimate_integer(tuple_sketch_agg_integer(key, summary)) FROM VALUES (1, 5), (1, 1), (2, 2), (2, 3), (3, 2) tab(key, summary);
3.0
-- Get aggregated summary (sum mode by default)
> SELECT tuple_sketch_summary_integer(tuple_sketch_agg_integer(key, summary)) FROM VALUES (1, 1), (1, 2), (2, 3) tab(key, summary);
6