sum aggregate function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Returns the sum calculated from the values of a group.

Syntax

sum ( [ALL | DISTINCT] expr ) [FILTER ( WHERE cond ) ]

This function can also be invoked as a window function using the OVER clause.

Arguments

  • expr: An expression that evaluates to a numeric or interval.
  • cond: An optional Boolean expression filtering the rows used for aggregation.

Returns

If expr is an integral number type, a BIGINT. If expr is DECIMAL(p, s) the result is DECIMAL(p + min(10, 31-p), s). If expr is an interval the result type matches expr.

Otherwise, a DOUBLE.

If DISTINCT is specified only unique values are summed up.

If the result overflows the result type Databricks SQL raises an ARITHMETIC_OVERFLOW error. To return a NULL instead use try_sum.

Warning

In Databricks Runtime, if spark.sql.ansi.enabled is false, an overflow of BIGINT does not cause an error but “wraps” the result instead.

Examples

> SELECT sum(col) FROM VALUES (5), (10), (15) AS tab(col);
 30

> SELECT sum(col) FILTER(WHERE col <15)
    FROM VALUES (5), (10), (15) AS tab(col);
 15

> SELECT sum(DISTINCT col) FROM VALUES (5), (10), (10), (15) AS tab(col);
 30

> SELECT sum(col) FROM VALUES (NULL), (10), (15) AS tab(col);
 25

> SELECT sum(col) FROM VALUES (NULL), (NULL) AS tab(col);
 NULL

-- try_sum overflows a BIGINT
> SELECT try_sum(c1) FROM VALUES(5E18::BIGINT), (5E18::BIGINT) AS tab(c1);
 NULL

-- In ANSI mode sum returns an error if it overflows BIGINT
> SELECT sum(c1) FROM VALUES(5E18::BIGINT), (5E18::BIGINT) AS tab(c1);
 Error: ARITHMETIC_OVERFLOW

-- try_sum overflows an INTERVAL
> SELECT try_sum(c1) FROM VALUES(INTERVAL '100000000' YEARS), (INTERVAL '100000000' YEARS) AS tab(c1);
  NULL

-- sum returns an error on INTERVAL overflow
> SELECT sum(c1) FROM VALUES(INTERVAL '100000000' YEARS), (INTERVAL '100000000' YEARS) AS tab(c1);
 Error: ARITHMETIC_OVERFLOW