APPROX_COUNT_DISTINCT (Transact-SQL)
Applies to: SQL Server 2019 (15.x) Azure SQL Database Azure SQL Managed Instance Azure Synapse Analytics SQL analytics endpoint in Microsoft Fabric Warehouse in Microsoft Fabric
This function returns the approximate number of unique non-null values in a group.
Transact-SQL syntax conventions
Syntax
APPROX_COUNT_DISTINCT ( expression )
Arguments
expression
An expression of any type, except image, sql_variant, ntext, or text.
Return types
bigint
Remarks
APPROX_COUNT_DISTINCT( expression )
evaluates an expression for each row in a group, and returns the approximate number of unique non-null values in a group. This function is designed to provide aggregations across large data sets where responsiveness is more critical than absolute precision.
APPROX_COUNT_DISTINCT
is designed for use in big data scenarios and is optimized for the following conditions:
- Access of data sets that are millions of rows or higher and
- Aggregation of a column or columns that have many distinct values
The function implementation guarantees up to a 2% error rate within a 97% probability.
APPROX_COUNT_DISTINCT
requires less memory than an exhaustive COUNT DISTINCT operation. Given the smaller memory footprint, APPROX_COUNT_DISTINCT
is less likely to spill memory to disk compared to a precise COUNT DISTINCT operation. To learn more about the algorithm used to achieve this, see HyperLogLog.
Note
With collation sensitive strings, APPROX_COUNT_DISTINCT uses a binary match and provides results that would have been generated in the presence of BIN collations and not BIN2.
Examples
A. Using APPROX_COUNT_DISTINCT
This example returns the approximate number of different order keys from the orders table.
SELECT APPROX_COUNT_DISTINCT(O_OrderKey) AS Approx_Distinct_OrderKey
FROM dbo.Orders;
Here's the result set.
Approx_Distinct_OrderKey
------------------------
15164704
B. Using APPROX_COUNT_DISTINCT with GROUP BY
This example returns the approximate number of different order keys by order status from the orders table.
SELECT O_OrderStatus, APPROX_COUNT_DISTINCT(O_OrderKey) AS Approx_Distinct_OrderKey
FROM dbo.Orders
GROUP BY O_OrderStatus
ORDER BY O_OrderStatus;
Here's the result set.
O_OrderStatus Approx_Distinct_OrderKey
---------------------------------------------------------------- ------------------------
F 7397838
O 7387803
P 388036