SUM (U-SQL)
The SUM aggregator returns the sum of all the values in the group. The input values have to be of a numeric type or an error is raised. Null values are ignored.
The identity value is null.
SUM_Expression := 'SUM' '(' ['DISTINCT'] expression ')'.
DISTINCT
Optionally allows to de-duplicate the values returned by the expression inside the group before aggregation.expression
The C# expression (including column references) that gets aggregated. Its type has to be a numeric type or an error is raised.
The return type is determined by the type of the evaluated result of the expression as follows:
Expression type | Return type | Comment |
---|---|---|
Integral types (short, int, uint, etc) implicitly castable to long and long? | long? | To be aligned with SQL semantics. |
ulong, ulong? | double? | Returns double to be able to handle large values. |
decimal, decimal? | decimal? | |
float, float?, double, double? | double? |
This aggregator can be used in a windowing expression without any additional restrictions.
The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
The examples below are based on the dataset defined below. Ensure your execution includes the rowset variable.
@employees = SELECT * FROM ( VALUES (1, "Noah", "Engineering", 100, 10000), (2, "Sophia", "Engineering", 100, 20000), (3, "Liam", "Engineering", 100, 30000), (4, "Emma", "HR", 200, 10000), (5, "Jacob", "HR", 200, 10000), (6, "Olivia", "HR", 200, 10000), (7, "Mason", "Executive", 300, 50000), (8, "Ava", "Marketing", 400, 15000), (9, "Ethan", "Marketing", 400, 10000) ) AS T(EmpID, EmpName, DeptName, DeptID, Salary);
A. Sum of all values(Salary
)
The following query: 1) calculates the total salary for all employees, and 2) calculates the total salary from all DISTINCT salary values.
@result =
SELECT SUM(Salary) AS TotalSalary,
SUM(DISTINCT Salary) AS TotalDistinctSalary
FROM @employees;
OUTPUT @result
TO "/Output/ReferenceGuide/sum/exampleA.csv"
USING Outputters.Csv();
B. Sum of values per group
The following query calculates the total salary for each department with the GROUP BY clause.
@result =
SELECT DeptName,
SUM(Salary) AS SalaryByDept
FROM @employees
GROUP BY DeptName;
OUTPUT @result
TO "/Output/ReferenceGuide/sum/exampleB.csv"
USING Outputters.Csv();
C. Sum of values with OVER()
The OVER clause in the following query is empty which defines the "window" to include all rows. The query calculates the total salary over the window - all employees.
@result =
SELECT EmpName,
SUM(Salary) OVER() AS SalaryAllDepts
FROM @employees;
OUTPUT @result
TO "/Output/ReferenceGuide/sum/exampleC.csv"
USING Outputters.Csv();
D. Sum of values over a defined window using OVER()
The OVER clause in the following query is DeptName
. The query returns EmpName
, DeptName
, and the total salary over the window - DeptName
.
@result =
SELECT EmpName,
DeptName,
SUM(Salary) OVER(PARTITION BY DeptName) AS TotalSalaryByDept
FROM @employees;
OUTPUT @result
TO "/Output/ReferenceGuide/sum/exampleD.csv"
USING Outputters.Csv();
E. Sum of values over a defined window using OVER(), additional example.
The OVER clause in the following query is DeptName
. The query returns all records, as well as the total salary for each DeptName
and each employee's salary share of his/her total department's share.
@result =
SELECT *,
SUM(Salary) OVER(PARTITION BY DeptName) AS TotalSalaryByDept,
((decimal) Salary / SUM(Salary) OVER(PARTITION BY DeptName)) * 100 AS ShareOfTotalSalaryByDept
FROM @employees;
OUTPUT @result
TO "/Output/ReferenceGuide/sum/exampleE.csv"
USING Outputters.Csv();