Training
Module
Gain insights from your data by using Kusto Query Language - Training
Learn how to write advanced queries in Kusto Query Language (KQL) by using the aggregation functions, the render operator, and variables.
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
approx_top_k
aggregate functionApplies to: Databricks SQL
Databricks Runtime 10.4 LTS and above
Returns the top k
most frequently occurring item values in an expr
along with their approximate counts.
approx_top_k(expr[, k[, maxItemsTracked]]) [FILTER ( WHERE cond ) ]
This function can also be invoked as a window function using the OVER
clause.
expr
: An expression of STRING, BOOLEAN, DATE, TIMESTAMP, or numeric type.k
: An optional INTEGER literal greater than 0. If k
is not specified, it defaults to 5
.maxItemsTracked
: An optional INTEGER literal greater than or equal to k
. If maxItemsTracked
is not specified, it defaults to 10000
.cond
: An optional boolean expression filtering the rows used for aggregation.Results are returned as an ARRAY of type STRUCT, where each STRUCT contains an item
field for the value (with its
original input type) and a count
field (of type LONG) with the approximate number of occurrences. The array is sorted
by count
descending.
The aggregate function returns the top k
most frequently occurring item values in an expression expr
along with
their approximate counts. The error in each count may be up to 2.0 * numRows / maxItemsTracked
where numRows
is the
total number of rows. Higher values of maxItemsTracked
provide better accuracy at the cost of increased memory usage.
Expressions that have fewer than maxItemsTracked
distinct items will yield exact item counts. Results include NULL
values as their own item in the results.
> SELECT approx_top_k(expr) FROM VALUES (0), (0), (1), (1), (2), (3), (4), (4) AS tab(expr);
[{'item':4,'count':2},{'item':1,'count':2},{'item':0,'count':2},{'item':3,'count':1},{'item':2,'count':1}]
> SELECT approx_top_k(expr, 2) FROM VALUES 'a', 'b', 'c', 'c', 'c', 'c', 'd', 'd' AS tab(expr);
[{'item':'c','count',4},{'item':'d','count':2}]
> SELECT approx_top_k(expr, 10, 100) FROM VALUES (0), (1), (1), (2), (2), (2) AS tab(expr);
[{'item':2,'count':3},{'item':1,'count':2},{'item':0,'count':1}]
Training
Module
Gain insights from your data by using Kusto Query Language - Training
Learn how to write advanced queries in Kusto Query Language (KQL) by using the aggregation functions, the render operator, and variables.