APPROX_COUNT_DISTINCT (Transact-SQL)

Applies to: SQL Server 2019 (15.x) Azure SQL Database Azure SQL Managed Instance Azure Synapse Analytics SQL analytics endpoint in Microsoft Fabric Warehouse in Microsoft Fabric

This function returns the approximate number of unique non-null values in a group.

Transact-SQL syntax conventions

Syntax

APPROX_COUNT_DISTINCT ( expression )   

Arguments

expression
An expression of any type, except image, sql_variant, ntext, or text.

Return types

bigint

Remarks

APPROX_COUNT_DISTINCT( expression ) evaluates an expression for each row in a group, and returns the approximate number of unique non-null values in a group. This function is designed to provide aggregations across large data sets where responsiveness is more critical than absolute precision.

APPROX_COUNT_DISTINCT is designed for use in big data scenarios and is optimized for the following conditions:

  • Access of data sets that are millions of rows or higher and
  • Aggregation of a column or columns that have many distinct values

The function implementation guarantees up to a 2% error rate within a 97% probability.

APPROX_COUNT_DISTINCT requires less memory than an exhaustive COUNT DISTINCT operation. Given the smaller memory footprint, APPROX_COUNT_DISTINCT is less likely to spill memory to disk compared to a precise COUNT DISTINCT operation. To learn more about the algorithm used to achieve this, see HyperLogLog.

Note

With collation sensitive strings, APPROX_COUNT_DISTINCT uses a binary match and provides results that would have been generated in the presence of BIN collations and not BIN2.

Examples

A. Using APPROX_COUNT_DISTINCT

This example returns the approximate number of different order keys from the orders table.

SELECT APPROX_COUNT_DISTINCT(O_OrderKey) AS Approx_Distinct_OrderKey
FROM dbo.Orders;

Here is the result set.

Approx_Distinct_OrderKey
------------------------
15164704

B. Using APPROX_COUNT_DISTINCT with GROUP BY

This example returns the approximate number of different order keys by order status from the orders table.

SELECT O_OrderStatus, APPROX_COUNT_DISTINCT(O_OrderKey) AS Approx_Distinct_OrderKey
FROM dbo.Orders
GROUP BY O_OrderStatus
ORDER BY O_OrderStatus; 

Here is the result set.

O_OrderStatus                                                    Approx_Distinct_OrderKey
---------------------------------------------------------------- ------------------------
F                                                                7397838
O                                                                7387803
P                                                                388036

See also

Aggregate Functions (Transact-SQL)
COUNT (Transact-SQL)