NTILE (U-SQL)
Summary
The NTILE ranking function returns the number of the group to which the row belongs from among the groups that the windowing function has distributed the rows using an ordered partition. The groups are numbered, starting at one.
If the number of rows in a partition is not divisible by the provided integer, it will cause groups of two sizes that differ by one member. Larger groups come before smaller groups in the order specified by the OVER clause. For example, if the total number of rows is 53 and the number of groups is five, the first three groups will have 11 rows and the two remaining groups will have 10 rows each. If on the other hand the total number of rows is divisible by the number of groups, the rows will be evenly distributed among the groups. For example, if the total number of rows is 50, and there are five groups, each bucket will contain 10 rows.
NTILE can only be used in the context of a windowing expression.
Syntax
NTILE_Expression := 'NTILE' '(' Group_Count ')'.
Group_Count := long_literal.
Remarks
Group_Count
Is a positive, nonnull constant of type long that specifies the number of groups into which each partition must be divided.
Return Type
The return type is long?.
Usage in Windowing Expression
This ranking function can be used in a windowing expression with the following restrictions:
- The ORDER BY clause in the OVER operator is required.
- The ROWS clause in the OVER operator is not allowed.
Examples
The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
The examples below are based on the dataset defined below. Ensure your execution includes the rowset variable.
@employees = SELECT * FROM ( VALUES (1, "Noah", "Engineering", 100, 10000), (2, "Sophia", "Engineering", 100, 15000), (3, "Liam", "Engineering", 100, 30000), (4, "Amy", "Engineering", 100, 35000), (5, "Justin", "Engineering", 100, 15000), (6, "Emma", "HR", 200, 8000), (7, "Jacob", "HR", 200, 8000), (8, "Olivia", "HR", 200, 8000), (9, "Mason", "Executive", 300, 50000), (10, "Ava", "Marketing", 400, 15000), (11, "Ethan", "Marketing", 400, 9000) ) AS T(EmpID, EmpName, DeptName, DeptID, Salary);
A. Dividing rows into groups
The following example divides rows into four groups of employees.
@result =
SELECT *,
NTILE(4) OVER() AS Quartile2
FROM @employees;
OUTPUT @result
TO "/Output/ReferenceGuide/Ranking/ntile/exampleA.csv"
USING Outputters.Csv();
B. Dividing rows into groups and using ORDER BY
The following example divides rows into four groups of employees based on his\her Salary
.
@result =
SELECT *,
NTILE(4) OVER(ORDER BY Salary DESC) AS QuartileBySalary
FROM @employees;
OUTPUT @result
TO "/Output/ReferenceGuide/Ranking/ntile/exampleB.csv"
// ORDER BY Salary DESC
USING Outputters.Csv();
C. Dividing the result set by using PARTITION BY
The rows are first partitioned by DeptID
and then divided into two groups within each DeptID
.
@result =
SELECT *,
NTILE(2) OVER(PARTITION BY DeptID) AS Median
FROM @employees;
OUTPUT @result
TO "/Output/ReferenceGuide/Ranking/ntile/exampleC.csv"
USING Outputters.Csv();
D. Dividing the result set by using PARTITION BY and ORDER BY
The rows are first partitioned by DeptID
and then divided into two groups within each DeptID
based on his\her Salary
.
@result =
SELECT *,
NTILE(2) OVER(PARTITION BY DeptID ORDER BY Salary DESC) AS MedianBySalary
FROM @employees;
OUTPUT @result
TO "/Output/ReferenceGuide/Ranking/ntile/exampleD.csv"
USING Outputters.Csv();