LAG (U-SQL)
Summary
The LAG analytic function provides access to a row at a given physical offset that comes before the current row. Use this analytic function in a SELECT expression to compare values in the current row with values in a previous row.
LAG can only be used in the context of a windowing expression.
Syntax
LAG_Expression := 'LAG' '(' expression [ ',' offset ] [ ',' default ] ')'.
Remarks
expression
The expression for which the first value gets calculated for the window.offset
The number of rows back from the current row from which to obtain a value. If not specified, the default is 1.offset
can be an expression that evaluates to a constant positive integral value. Column references and method calls are not allowed.default
The value to return whenexpression
atoffset
is NULL. If a default value is not specified, NULL is returned.default
must be a constant and type-compatible withexpression
. Column references and method calls are not allowed.
Return Type
The nullable type of the input.
Usage in Windowing Expression
This analytic function can be used in a windowing expression with the following restrictions:
Examples
The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
The examples below are based on the dataset defined below. Ensure your execution includes the rowset variable.
@storeSales = SELECT * FROM ( VALUES (1, "NW", 2013, 100), (1, "NW", 2014, 150), (1, "NW", 2015, 300), (1, "NW", 2016, 640), (2, "SW", 2013, 200), (2, "SW", 2014, 350), (2, "SW", 2015, 500), (2, "SW", 2016, 650), (3, "NW", 2015, 75), (3, "NW", 2016, 100), (4, "NW", 2016, 375), (5, "SW", 2016, 700) ) AS T(StoreID, Region, Year, Sales);
A. Compare values between years
The following example uses the LAG
function to return the difference in sales for a specific store over previous years. Notice that because there is no lag value available for the last row, the default of zero (0) is returned.
@result =
SELECT StoreID,
Year AS SalesYear,
Sales AS CurrentSales,
LAG(Sales, 1, 0) OVER(ORDER BY Year) AS PreviousSales
FROM @storeSales
WHERE StoreID == 1 AND Year >= 2014;
OUTPUT @result
TO "/Output/ReferenceGuide/Analytic/lag/exampleA.csv"
ORDER BY SalesYear DESC
USING Outputters.Csv();
B. Dividing the result set using PARTITION BY
The following example uses the LAG
function to compare year-to-date sales between stores. Each record shows a store's sales and the sales of the store with the nearest lower sales. The PARTITION BY clause is specified to divide the rows in the result set by region. The LAG
function is applied to each partition separately and computation restarts for each partition. The ORDER BY clause in the OVER clause orders the rows in each partition. The ORDER BY clause in the OUTPUT
statement sorts the rows in the whole result set. Notice that because there is no lag value available for the last row of each partition, the default of zero (0) is returned.
@result =
SELECT Region,
StoreID,
Sales,
LAG(Sales, 1, 0) OVER(PARTITION BY Region ORDER BY Sales ASC ) AS NearestLowerSales
FROM @storeSales
WHERE Year == 2016;
OUTPUT @result
TO "/Output/ReferenceGuide/Analytic/lag/exampleB.csv"
ORDER BY Region, Sales DESC
USING Outputters.Csv();