Use aggregate functions

Completed

T-SQL provides aggregate functions such as SUM, MAX, and AVG to perform calculations that take multiple values and return a single result.

Working with aggregate functions

Most of the queries we have looked at operate on a row at a time, using a WHERE clause to filter rows. Each row returned corresponds to one row in the original data set.

Many aggregate functions are provided in SQL Server. In this section, we’ll look at the most common functions such as SUM, MIN, MAX, AVG, and COUNT.

When working with aggregate functions, you need to consider the following points:

  • Aggregate functions return a single (scalar) value and can be used in SELECT statements almost anywhere a single value can be used. For example, these functions can be used in the SELECT, HAVING, and ORDER BY clauses. However, they cannot be used in the WHERE clause.
  • Aggregate functions ignore NULLs, except when using COUNT(*).
  • Aggregate functions in a SELECT list don't have a column header unless you provide an alias using AS.
  • Aggregate functions in a SELECT list operate on all rows passed to the SELECT operation. If there is no GROUP BY clause, all rows satisfying any filter in the WHERE clause will be summarized. You will learn more about GROUP BY in the next topic.
  • Unless you're using GROUP BY, you shouldn't combine aggregate functions with columns not included in functions in the same SELECT list.

To extend beyond the built-in functions, SQL Server provides a mechanism for user-defined aggregate functions via the .NET Common Language Runtime (CLR). That topic is beyond the scope of this module.

Built-in aggregate functions

As mentioned, Transact-SQL provides many built-in aggregate functions. Commonly used functions include:

Function Name

Syntax

Description

SUM

SUM(expression)

Totals all the non-NULL numeric values in a column.

AVG

AVG(expression)

Averages all the non-NULL numeric values in a column (sum/count).

MIN

MIN(expression)

Returns the smallest number, earliest date/time, or first-occurring string (according to collation sort rules).

MAX

MAX(expression)

Returns the largest number, latest date/time, or last-occurring string (according to collation sort rules).

COUNT or COUNT_BIG

COUNT(*) or COUNT(expression)

With (*), counts all rows, including rows with NULL values. When a column is specified as expression, returns the count of non-NULL rows for that column. COUNT returns an int; COUNT_BIG returns a big_int.

To use a built-in aggregate in a SELECT clause, consider the following example in the MyStore sample database:

SELECT AVG(ListPrice) AS AveragePrice,
       MIN(ListPrice) AS MinimumPrice,
       MAX(ListPrice) AS MaximumPrice
FROM Production.Product;

The results of this query look something like this:

AveragePrice

MinimumPrice

MaximumPrice

744.5952

2.2900

3578.2700

Note that the above example summarizes all rows from the Production.Product table. We could easily modify the query to return the average, minimum, and maximum prices for products in a specific category by adding a WHERE clause, like this:

SELECT AVG(ListPrice) AS AveragePrice,
       MIN(ListPrice) AS MinimumPrice,
       MAX(ListPrice) AS MaximumPrice
FROM Production.Product
WHERE ProductCategoryID = 15;

When using aggregates in a SELECT clause, all columns referenced in the SELECT list must be used as inputs for an aggregate function, or be referenced in a GROUP BY clause.

Consider the following query, which attempts to include the ProductCategoryID field in the aggregated results:

SELECT ProductCategoryID, AVG(ListPrice) AS AveragePrice,
MIN(ListPrice) AS MinimumPrice,
MAX(ListPrice) AS MaximumPrice
FROM Production.Product;

Running this query results in the following error

Msg 8120, Level 16, State 1, Line 1

Column 'Production.ProductCategoryID' is invalid in the select list because it isn't contained in either an aggregate function or the GROUP BY clause.

The query treats all rows as a single aggregated group. Therefore, all columns must be used as inputs to aggregate functions.

In the previous examples, we aggregated numeric data such as the price and quantities in the previous example. Some of the aggregate functions can also be used to summarize date, time, and character data. The following examples show the use of aggregates with dates and characters:

This query returns first and last company by name, using MIN and MAX:

SELECT MIN(CompanyName) AS MinCustomer, 
       MAX(CompanyName) AS MaxCustomer
FROM SalesLT.Customer;

This query will return the first and last values for CompanyName in the database's collation sequence, which in this case is alphabetical order:

MinCustomer

MaxCustomer

A Bike Store

Yellow Bicycle Company

Other functions may be nested with aggregate functions.

For example, the YEAR scalar function is used in the following example to return only the year portion of the order date, before MIN and MAX are evaluated:

SELECT MIN(YEAR(OrderDate)) AS Earliest,
       MAX(YEAR(OrderDate)) AS Latest
FROM Sales.SalesOrderHeader;

Earliest

Latest

2008

2021

The MIN and MAX functions can also be used with date data, to return the earliest and latest chronological values. However, AVG and SUM can only be used for numeric data, which includes integers, money, float and decimal datatypes.

Using DISTINCT with aggregate functions

You should be aware of the use of DISTINCT in a SELECT clause to remove duplicate rows. When used with an aggregate function, DISTINCT removes duplicate values from the input column before computing the summary value. DISTINCT is useful when summarizing unique occurrences of values, such as customers in the orders table.

The following example returns the number of customers who have placed orders, no matter how many orders they placed:

SELECT COUNT(DISTINCT CustomerID) AS UniqueCustomers
FROM Sales.SalesOrderHeader;

COUNT(<some_column>) merely counts how many rows have some value in the column. If there are no NULL values, COUNT(<some_column>) will be the same as COUNT(*). COUNT (DISTINCT <some_column>) counts how many different values there are in the column.

Using aggregate functions with NULL

It is important to be aware of the possible presence of NULLs in your data, and of how NULL interacts with T-SQL query components, including aggregate function. There are a few considerations to be aware of:

  • With the exception of COUNT used with the (*) option, T-SQL aggregate functions ignore NULLs. For example, a SUM function will add only non-NULL values. NULLs don't evaluate to zero. COUNT(*) counts all rows, regardless of value or non-value in any column.
  • The presence of NULLs in a column may lead to inaccurate computations for AVG, which will sum only populated rows and divide that sum by the number of non-NULL rows. There may be a difference in results between AVG(<column>) and (SUM(<column>)/COUNT(*)).

For example, consider the following table named t1:

C1

C2

1

NULL

2

10

3

20

4

30

5

40

6

50

This query illustrates the difference between how AVG handles NULL and how you might calculate an average with a SUM/COUNT(*) computed column:

SELECT SUM(c2) AS sum_nonnulls, 
    COUNT(*) AS count_all_rows, 
    COUNT(c2) AS count_nonnulls, 
    AVG(c2) AS average, 
    (SUM(c2)/COUNT(*)) AS arith_average
FROM t1;

The result would be:

sum_nonnulls

count_all_rows

count_nonnulls

average

arith_average

150

6

5

30

25

In this resultset, the column named average is the aggregate that internally gets the sum of 150 and divides by the count of non-null values in column c2. The calculation would be 150/5, or 30. The column called arith_average explicitly divides the sum by the count of all rows, so the calculation is 150/6, or 25.

If you need to summarize all rows, whether NULL or not, consider replacing the NULLs with another value that will not be ignored by your aggregate function. You can use the COALESCE function for this purpose.