avg
aggregate function
Applies to: Databricks SQL Databricks Runtime
Returns the mean calculated from values of a group. This function is a synonym for mean aggregate function.
Syntax
avg( [ALL | DISTINCT] expr) [FILTER ( WHERE cond ) ]
This function can also be invoked as a window function using the OVER
clause.
Arguments
expr
: An expression that evaluates to a numeric or an interval.cond
: An optional boolean expression filtering the rows used for aggregation.
Returns
The result type is computed as for the arguments:
DECIMAL(p, s)
: The result type is aDECIMAL(p + 4, s + 4)
. If the maximum precision for DECIMAL is reached the increase in scale will be limited to avoid loss of significant digits.- year-month interval: The result is an
INTERVAL YEAR TO MONTH
. - day-time interval: The result is an
INTERVAL DAY TO SECOND
. - In all other cases the result is a DOUBLE.
Nulls within the group are ignored. If a group is empty or consists only of nulls, the result is NULL.
If DISTINCT
is specified the average is computed after duplicates have been removed.
If the result overflows the result type, Azure Databricks raises an ARITHMETIC_OVERFLOW error. To return a NULL instead use try_avg.
Warning
In Databricks Runtime, if spark.sql.ansi.enabled is false
, an overflow returns NULL
instead of an error.
Examples
> SELECT avg(col) FROM VALUES (1), (2), (3) AS tab(col);
2.0
> SELECT avg(DISTINCT col) FROM VALUES (1), (1), (2) AS tab(col);
1.5
> SELECT avg(col) FROM VALUES (1), (2), (NULL) AS tab(col);
1.5
> SELECT avg(col) FROM VALUES (INTERVAL '1' YEAR), (INTERVAL '2' YEAR) AS tab(col);
1-6
-- Overflow results in NULL for try_avg()
> SELECT try_avg(col) FROM VALUES (5e37::DECIMAL(38, 0)), (5e37::DECIMAL(38, 0)) AS tab(col);
NULL
-- Overflow causes error for avg() in ANSI mode.
> SELECT avg(col) FROM VALUES (5e37::DECIMAL(38, 0)), (5e37::DECIMAL(38, 0)) AS tab(col);
Error: CANNOT_CHANGE_DECIMAL_PRECISION