Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to:
Databricks SQL
Databricks Runtime
Returns the value of an expression associated with the largest value of a second expression in a group. With the optional third argument, returns an array of up to limit values corresponding to the largest values of the ordering expression.
Syntax
max_by(expr, ordExpr) [FILTER ( WHERE cond ) ]
max_by(expr, ordExpr, limit) [FILTER ( WHERE cond ) ]
This function can also be invoked as a window function using the OVER clause.
Arguments
- expr: The expression whose value is to be returned. Can be any data type.
- ordExpr: The expression used to determine the maximum value (the sorting key). Must be an orderable type.
- limit: (Optional) An
INTvalue representing the maximum number of values to return. Must be greater than 0 and less than or equal to 100,000. When provided, the function returns an array of up tolimitvalues. - cond: An optional boolean expression filtering the rows used for aggregation.
Returns
Without limit: the result type matches the type of expr. If multiple rows have the same maximum value of ordExpr, the result is non-deterministic.
With limit: an ARRAY of values whose element type matches the type of expr. The array contains up to limit elements. Values corresponding to NULL orderings are not included. If all ordering values are NULL, the result is NULL. If multiple rows share the same largest values, the function is non-deterministic.
Note
For certain STRING collations, such as UTF8_LCASE, the result may be non-deterministic as well.
Examples
> SELECT max_by(x, y, 2) FROM VALUES (('a', 10)), (('b', 50)), (('c', 20)) AS tab(x, y);
[b, c]
-- Return top 2 values by ordering expression
> SELECT year, max_by(course, earnings, 2) FROM data GROUP BY year;
year: 2012, max_by(course, earnings, 2): [Java, c]
> SELECT max_by(x, y COLLATE UTF8_LCASE) FROM VALUES (('a', 'X')), (('b', 'x')), (('c', 'v')) AS tab(x, y);
a (or b)