Apply window functions for analytics
Analytical queries often require calculations that span multiple rows while still returning individual row details. Traditional aggregate functions collapse rows into groups, losing row-level information. Window functions solve this challenge by performing calculations across a set of rows related to the current row, without collapsing the result set.
Understand window function syntax
Window functions calculate values across a "window" of rows defined by the OVER clause. Unlike regular aggregate functions, window functions don't group rows into a single output row. Instead, they compute values across related rows while preserving all original rows in the result.
The general syntax for a window function is:
function_name(arguments) OVER (
[PARTITION BY partition_expression]
[ORDER BY order_expression [ASC | DESC]]
[ROWS | RANGE frame_specification]
)
The OVER clause components control how the window is defined:
- PARTITION BY: Divides rows into groups (partitions) for the calculation
- ORDER BY: Determines the logical order of rows within each partition
- ROWS/RANGE: Defines the frame boundaries relative to the current row
The following query demonstrates a simple window function that calculates a running total of order amounts per customer:
SELECT
CustomerID,
SalesOrderID,
OrderDate,
TotalDue,
SUM(TotalDue) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS RunningTotal
FROM SalesLT.SalesOrderHeader
ORDER BY CustomerID, OrderDate;
Note
When you specify ORDER BY in the OVER clause without a frame specification, the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW for aggregate functions. This creates cumulative calculations.
Use ranking functions
Ranking functions assign sequential numbers to rows based on their position within a partition. SQL Server provides four ranking functions. Each function handles ties differently:
ROW_NUMBER() assigns a unique sequential number to each row, with no duplicates even for tied values:
SELECT
ProductID,
Name,
ListPrice,
ROW_NUMBER() OVER (ORDER BY ListPrice DESC) AS PriceRank
FROM SalesLT.Product
WHERE ListPrice > 0;
The result set looks like this:
ProductID Name ListPrice PriceRank
--------- ------------------------- --------- ---------
749 Road-150 Red, 62 3578.27 1
750 Road-150 Red, 44 3578.27 2
751 Road-150 Red, 48 3578.27 3
771 Mountain-100 Silver, 38 3399.99 4
This query ranks all products by price from highest to lowest. Each product receives a unique number regardless of whether multiple products share the same price.
RANK() assigns the same rank to tied values, then skips numbers to account for the ties:
SELECT
ProductID,
Name,
ListPrice,
RANK() OVER (ORDER BY ListPrice DESC) AS PriceRank
FROM SalesLT.Product
WHERE ListPrice > 0;
The result set looks like this:
ProductID Name ListPrice PriceRank
--------- ------------------------- --------- ---------
749 Road-150 Red, 62 3578.27 1
750 Road-150 Red, 44 3578.27 1
751 Road-150 Red, 48 3578.27 1
771 Mountain-100 Silver, 38 3399.99 4
When two products have identical prices, both receive the same rank. The next product's rank reflects the total number of products ranked higher, creating gaps in the sequence.
DENSE_RANK() assigns the same rank to tied values but doesn't skip numbers:
SELECT
ProductID,
Name,
ListPrice,
DENSE_RANK() OVER (ORDER BY ListPrice DESC) AS PriceRank
FROM SalesLT.Product
WHERE ListPrice > 0;
The result set looks like this:
ProductID Name ListPrice PriceRank
--------- ------------------------- --------- ---------
749 Road-150 Red, 62 3578.27 1
750 Road-150 Red, 44 3578.27 1
751 Road-150 Red, 48 3578.27 1
771 Mountain-100 Silver, 38 3399.99 2
Like RANK(), tied values share the same rank. However, DENSE_RANK() continues with the next consecutive number, so you can use it to count distinct price levels.
NTILE(n) distributes rows into a specified number of roughly equal groups:
SELECT
ProductID,
Name,
ListPrice,
NTILE(4) OVER (ORDER BY ListPrice DESC) AS PriceQuartile
FROM SalesLT.Product
WHERE ListPrice > 0;
The result set looks like this:
ProductID Name ListPrice PriceQuartile
--------- --------------------------- --------- -------------
749 Road-150 Red, 62 3578.27 1
771 Mountain-100 Silver, 38 3399.99 1
722 LL Road Frame - Black, 58 337.22 2
859 Half-Finger Gloves, S 24.49 4
This query divides products into four groups based on price. The highest-priced products are in quartile 1, and the lowest-priced are in quartile 4. Use NTILE() for percentile analysis or distributing work evenly.
Combining PARTITION BY with ranking functions enables per-group rankings:
SELECT
pc.Name AS Category,
p.Name AS Product,
p.ListPrice,
ROW_NUMBER() OVER (
PARTITION BY p.ProductCategoryID
ORDER BY p.ListPrice DESC
) AS CategoryPriceRank
FROM SalesLT.Product AS p
INNER JOIN SalesLT.ProductCategory AS pc
ON p.ProductCategoryID = pc.ProductCategoryID
WHERE p.ListPrice > 0;
The result set looks like this:
Category Product ListPrice CategoryPriceRank
--------------- ------------------------- --------- -----------------
Road Bikes Road-150 Red, 62 3578.27 1
Road Bikes Road-150 Red, 44 3578.27 2
Mountain Bikes Mountain-100 Silver, 38 3399.99 1
Mountain Bikes Mountain-100 Black, 38 3374.99 2
This query ranks products within each category separately. The ranking restarts at 1 for each category, so you can identify the most expensive product in each category by filtering for CategoryPriceRank = 1.
Tip
Use ROW_NUMBER() when you need exactly one row per rank (such as finding the top N per group). Use RANK() or DENSE_RANK() when you need to preserve tie information for reporting purposes.
Apply aggregate window functions
Standard aggregate functions like SUM, AVG, COUNT, MIN, and MAX can be used as window functions by adding the OVER clause. This allows you to calculate aggregates while retaining individual row details.
The following query demonstrates how to calculate running totals and cumulative aggregates:
SELECT
SalesOrderID,
OrderDate,
TotalDue,
SUM(TotalDue) OVER (ORDER BY OrderDate, SalesOrderID) AS RunningTotal,
AVG(TotalDue) OVER (ORDER BY OrderDate, SalesOrderID) AS RunningAverage,
COUNT(*) OVER (ORDER BY OrderDate, SalesOrderID) AS OrderNumber
FROM SalesLT.SalesOrderHeader
ORDER BY OrderDate, SalesOrderID;
The result set looks like this:
SalesOrderID OrderDate TotalDue RunningTotal RunningAverage OrderNumber
------------ ---------- --------- ------------ -------------- -----------
71774 2008-06-01 972.785 972.785 972.785 1
71776 2008-06-01 87.083 1059.868 529.934 2
71780 2008-06-01 42452.65 43512.518 14504.172 3
71782 2008-06-01 43962.79 87475.308 21868.827 4
Important
When using aggregate window functions without ORDER BY in the OVER clause, the function calculates across the entire partition. Adding ORDER BY creates a running calculation from the partition start to the current row.
Define window frames with ROWS and RANGE
Window frames let you specify exactly which rows relative to the current row should be included in the calculation. The ROWS clause counts physical rows, while RANGE groups rows with equal values.
Frame boundaries can be specified using:
UNBOUNDED PRECEDING: From the partition startn PRECEDING:nrows before current rowCURRENT ROW: The current rown FOLLOWING:nrows after current rowUNBOUNDED FOLLOWING: To the partition end
The following query calculates a moving average over the last three orders:
SELECT
SalesOrderID,
OrderDate,
TotalDue,
AVG(TotalDue) OVER (
ORDER BY OrderDate
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS MovingAvg3Orders
FROM SalesLT.SalesOrderHeader
ORDER BY OrderDate;
The result set looks like this:
SalesOrderID OrderDate TotalDue MovingAvg3Orders
------------ ---------- --------- ----------------
71774 2008-06-01 972.785 972.785
71776 2008-06-01 87.083 529.934
71780 2008-06-01 42452.65 14504.172
71782 2008-06-01 43962.79 28834.174
This query calculates a 3-order moving average by including the current row and the two rows before it. For the first row, only one value is available, so the average equals TotalDue. By the third row, the window includes all three rows.
Use analytical functions
Analytical functions let you access data from other rows without using self-joins or subqueries. These functions are useful for time-series analysis, trend detection, and comparing current values against historical or future values. Unlike aggregate window functions that compute summaries, analytical functions retrieve specific values from specific rows in the window.
LAG() and LEAD() access values from previous or subsequent rows, like this:
SELECT
SalesOrderID,
OrderDate,
TotalDue,
LAG(TotalDue, 1, 0) OVER (ORDER BY OrderDate) AS PreviousOrderTotal,
LEAD(TotalDue, 1, 0) OVER (ORDER BY OrderDate) AS NextOrderTotal,
TotalDue - LAG(TotalDue, 1, 0) OVER (ORDER BY OrderDate) AS ChangeFromPrevious
FROM SalesLT.SalesOrderHeader
ORDER BY OrderDate;
The result set looks like this:
SalesOrderID OrderDate TotalDue PreviousOrderTotal NextOrderTotal ChangeFromPrevious
------------ ---------- --------- ------------------ -------------- ------------------
71774 2008-06-01 972.785 0 87.083 972.785
71776 2008-06-01 87.083 972.785 42452.65 -885.702
71780 2008-06-01 42452.65 87.083 43962.79 42365.567
71782 2008-06-01 43962.79 42452.65 0 1510.14
LAG() retrieves a value from a previous row, while LEAD() retrieves from a following row. The second parameter specifies how many rows to look back or ahead (default is 1), and the third parameter provides a default value when no row exists (such as for the first row with LAG()). Use these functions to calculate period-over-period changes, identify trends, or detect anomalies in sequential data.
FIRST_VALUE() and LAST_VALUE() return values from the first or last row in the frame:
SELECT
ProductID,
Name,
ListPrice,
ProductCategoryID,
FIRST_VALUE(Name) OVER (
PARTITION BY ProductCategoryID
ORDER BY ListPrice DESC
) AS MostExpensiveInCategory,
LAST_VALUE(Name) OVER (
PARTITION BY ProductCategoryID
ORDER BY ListPrice DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS LeastExpensiveInCategory
FROM SalesLT.Product
WHERE ListPrice > 0;
The result set looks like this:
ProductID Name ListPrice ProductCategoryID MostExpensiveInCategory LeastExpensiveInCategory
--------- ------------------------- --------- ----------------- ------------------------ ------------------------
749 Road-150 Red, 62 3578.27 5 Road-150 Red, 62 LL Road Frame - Red, 58
750 Road-150 Red, 44 3578.27 5 Road-150 Red, 62 LL Road Frame - Red, 58
722 LL Road Frame - Red, 58 337.22 5 Road-150 Red, 62 LL Road Frame - Red, 58
771 Mountain-100 Silver, 38 3399.99 6 Mountain-100 Silver, 38 Mountain-500 Black, 52
FIRST_VALUE() returns the value from the first row in the ordered window, which in this case is the most expensive product per category. LAST_VALUE() returns the least expensive, but requires an explicit frame to include all rows. These functions help you compare each row against benchmark values like the highest, lowest, or baseline value in a group.
Note
LAST_VALUE() requires an explicit frame specification to include rows after the current row. Without it, the default frame only includes rows up to the current row, making LAST_VALUE() return the current row's value.
PERCENT_RANK() and CUME_DIST() calculate relative position within a partition:
SELECT
Name,
ListPrice,
PERCENT_RANK() OVER (ORDER BY ListPrice) AS PercentRank,
CUME_DIST() OVER (ORDER BY ListPrice) AS CumulativeDistribution
FROM SalesLT.Product
WHERE ListPrice > 0
ORDER BY ListPrice;
The result set looks like this:
Name ListPrice PercentRank CumulativeDistribution
------------------------- --------- ----------- ----------------------
Patch Kit/8 Patches 2.29 0.0 0.0081
Road Tire Tube 3.99 0.0081 0.0162
Touring Tire Tube 4.99 0.0162 0.0243
Road-150 Red, 62 3578.27 0.9919 1.0
PERCENT_RANK() returns a value between 0 and 1 indicating what percentage of rows have lower values (0 means lowest, one means highest). CUME_DIST() shows the cumulative distribution, indicating what percentage of rows have values less than or equal to the current row. Use these functions for percentile analysis, identifying outliers, or creating distribution reports.
For more information about window functions, see Window Functions (Transact-SQL) and Ranking Functions.