PERCENTILE_CONT (U-SQL)
Summary
The PERCENTILE_CONT analytic function calculates a percentile based on a continuous distribution of the values in the window. The result is interpolated and might not be equal to any of the specific values in the column.
PERCENTILE_CONT can only be used in the context of a windowing expression.
Syntax
PERCENTILE_CONT_Expression := 'PERCENTILE_CONT' '(' double_literal ')' 'WITHIN' 'GROUP' '(' Simple_Order_By_Clause ')'.
Remarks
double_literal
The percentile to compute. The value must range between 0.0 and 1.0 and be of type double.WITHIN GROUP ( Simple_Order_By_Clause )
Specifies a list of numeric values to sort and compute the percentile over.
Syntax
Simple_Order_By_Clause := 'ORDER' 'BY' Sort_Item_Expression.
Sort_Item_Expression := expression [Sort_Direction].
Sort_Direction := 'ASC' | 'DESC'.
Only one ORDER BY
expression is allowed. The type of the expression has to be a nullable or nonnullable numeric type. The default sort order is ascending.
Return Type
The return type is double?.
Usage in Windowing Expression
This analytic function can be used in a windowing expression with the following restrictions:
- Only the PARTITION BY clause can be specified with the OVER operator.
- The ORDER BY clause in the OVER operator is not allowed.
- The ROWS clause in the OVER operator is not allowed.
Examples
The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
The example below is based on the dataset defined below. Ensure your execution includes the rowset variable.
@employees = SELECT * FROM ( VALUES (1, "Noah", "Engineering", 100, 10000), (2, "Sophia", "Engineering", 100, 20000), (3, "Liam", "Engineering", 100, 30000), (4, "Amy", "Engineering", 100, 35000), (5, "Emma", "HR", 200, 8000), (6, "Jacob", "HR", 200, 8000), (7, "Olivia", "HR", 200, 8000), (8, "Mason", "Executive", 300, 50000), (9, "Ava", "Marketing", 400, 15000), (10, "Ethan", "Marketing", 400, 9000) ) AS T(EmpID, EmpName, DeptName, DeptID, Salary);
Using PERCENTILE_CONT
The following example uses PERCENTILE_CONT
to find the median employee salary in each department.
@result =
SELECT DISTINCT DeptName,
PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY Salary) OVER(PARTITION BY DeptName) AS MedianCont
FROM @employees;
OUTPUT @result
TO "/Output/ReferenceGuide/percentile_cont/example.csv"
USING Outputters.Csv();