APPROX_COUNT_DISTINCT (Transact-SQL)
Applies to:
SQL Server 2019 (15.x)
Azure SQL Database
Azure SQL Managed Instance
Azure Synapse Analytics
This function returns the approximate number of unique non-null values in a group.
Transact-SQL syntax conventions
Syntax
APPROX_COUNT_DISTINCT ( expression )
Note
To view Transact-SQL syntax for SQL Server 2014 and earlier, see Previous versions documentation.
Arguments
expression
An expression of any type, except image, sql_variant, ntext, or text.
Return types
bigint
Remarks
APPROX_COUNT_DISTINCT( expression )
evaluates an expression for each row in a group, and returns the approximate number of unique non-null values in a group. This function is designed to provide aggregations across large data sets where responsiveness is more critical than absolute precision.
APPROX_COUNT_DISTINCT
is designed for use in big data scenarios and is optimized for the following conditions:
- Access of data sets that are millions of rows or higher and
- Aggregation of a column or columns that have many distinct values
The function implementation guarantees up to a 2% error rate within a 97% probability.
APPROX_COUNT_DISTINCT
requires less memory than an exhaustive COUNT DISTINCT operation. Given the smaller memory footprint, APPROX_COUNT_DISTINCT
is less likely to spill memory to disk compared to a precise COUNT DISTINCT operation. To learn more about the algorithm used to achieve this, see HyperLogLog.
Note
With collation sensitive strings, APPROX_COUNT_DISTINCT uses a binary match and provides results that would have been generated in the presence of BIN collations and not BIN2.
Examples
A. Using APPROX_COUNT_DISTINCT
This example returns the approximate number of different order keys from the orders table.
SELECT APPROX_COUNT_DISTINCT(O_OrderKey) AS Approx_Distinct_OrderKey
FROM dbo.Orders;
Here is the result set.
Approx_Distinct_OrderKey
------------------------
15164704
B. Using APPROX_COUNT_DISTINCT with GROUP BY
This example returns the approximate number of different order keys by order status from the orders table.
SELECT O_OrderStatus, APPROX_COUNT_DISTINCT(O_OrderKey) AS Approx_Distinct_OrderKey
FROM dbo.Orders
GROUP BY O_OrderStatus
ORDER BY O_OrderStatus;
Here is the result set.
O_OrderStatus Approx_Distinct_OrderKey
---------------------------------------------------------------- ------------------------
F 7397838
O 7387803
P 388036