NTILE (U-SQL)

Summary

The NTILE ranking function returns the number of the group to which the row belongs from among the groups that the windowing function has distributed the rows using an ordered partition. The groups are numbered, starting at one.

If the number of rows in a partition is not divisible by the provided integer, it will cause groups of two sizes that differ by one member. Larger groups come before smaller groups in the order specified by the OVER clause. For example, if the total number of rows is 53 and the number of groups is five, the first three groups will have 11 rows and the two remaining groups will have 10 rows each. If on the other hand the total number of rows is divisible by the number of groups, the rows will be evenly distributed among the groups. For example, if the total number of rows is 50, and there are five groups, each bucket will contain 10 rows.

NTILE can only be used in the context of a windowing expression.

Syntax

NTILE_Expression :=                                                                                      
     'NTILE' '(' Group_Count ')'.
Group_Count := long_literal.

Remarks

  • Group_Count
    Is a positive, nonnull constant of type long that specifies the number of groups into which each partition must be divided.

Return Type

The return type is long?.

Usage in Windowing Expression

This ranking function can be used in a windowing expression with the following restrictions:

  • The ORDER BY clause in the OVER operator is required.
  • The ROWS clause in the OVER operator is not allowed.

Examples

  • The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.

  • The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.

  • The examples below are based on the dataset defined below. Ensure your execution includes the rowset variable.

      @employees = 
          SELECT * FROM 
              ( VALUES
              (1, "Noah",   "Engineering", 100, 10000),
              (2, "Sophia", "Engineering", 100, 15000),
              (3, "Liam",   "Engineering", 100, 30000),
              (4, "Amy",    "Engineering", 100, 35000),
              (5, "Justin", "Engineering", 100, 15000),
              (6, "Emma",   "HR",          200, 8000),
              (7, "Jacob",  "HR",          200, 8000),
              (8, "Olivia", "HR",          200, 8000),
              (9, "Mason",  "Executive",   300, 50000),
              (10, "Ava",   "Marketing",   400, 15000),
              (11, "Ethan", "Marketing",   400, 9000) 
              ) AS T(EmpID, EmpName, DeptName, DeptID, Salary);
    

A. Dividing rows into groups
The following example divides rows into four groups of employees.

@result =
    SELECT *,
           NTILE(4) OVER() AS Quartile2
    FROM @employees;

OUTPUT @result
TO "/Output/ReferenceGuide/Ranking/ntile/exampleA.csv"
USING Outputters.Csv();

B. Dividing rows into groups and using ORDER BY
The following example divides rows into four groups of employees based on his\her Salary.

@result =
    SELECT *,
           NTILE(4) OVER(ORDER BY Salary DESC) AS QuartileBySalary
    FROM @employees;

OUTPUT @result
TO "/Output/ReferenceGuide/Ranking/ntile/exampleB.csv"
// ORDER BY Salary DESC
USING Outputters.Csv();

C. Dividing the result set by using PARTITION BY
The rows are first partitioned by DeptID and then divided into two groups within each DeptID.

@result =
    SELECT *,
           NTILE(2) OVER(PARTITION BY DeptID) AS Median
    FROM @employees;

OUTPUT @result
TO "/Output/ReferenceGuide/Ranking/ntile/exampleC.csv"
USING Outputters.Csv();

D. Dividing the result set by using PARTITION BY and ORDER BY
The rows are first partitioned by DeptID and then divided into two groups within each DeptID based on his\her Salary.

@result =
    SELECT *,
           NTILE(2) OVER(PARTITION BY DeptID ORDER BY Salary DESC) AS MedianBySalary
    FROM @employees;

OUTPUT @result
TO "/Output/ReferenceGuide/Ranking/ntile/exampleD.csv"
USING Outputters.Csv();

See Also