Aggregate Functions (U-SQL)
Summary
U-SQL provides both built-in aggregation functions and the ability for the user to define user-defined aggregators.
An aggregator will compute a single result value over a group of values and will have an identity value for the case that the group is empty.
In U-SQL, aggregators can only be used in the following syntactic contexts:
Inside a SELECT’s SELECT clause:
It will calculate the aggregated value for each group specified by a GROUP BY clause. If the rowset is empty and thus all groups are empty, the SELECT expression will return an empty rowset.
If no GROUP BY has been specified, then every column in the SELECT clause has to be an expression that needs to contain at least one aggregator. The aggregations will be computed across the whole rowset of the SELECT expression. If the rowset is empty, then the aggregators return their identity value, which is the value that is would not change the aggregation result if it was added. I.e., 0 for COUNT() and null for all others.
As part of a U-SQL windowing expression where it will calculate the aggregated value inside each window partition. Some built-in aggregators are not supported in windowing expressions (see each aggregator section for more details).
Aggregations can appear in expressions
Aggregations always need to be aliased in the SELECT clause.
Inside a GROUP BY’s HAVING clause to provide a filter condition on the group based on the aggregation’s value.
Some of the aggregators are type-polymorphic, meaning that they can operate on many different input types and return potentially different types based on their input type. For example, COUNT() will take any input type, and always return a value of type long, while SUM() will return a value of a type that is dependent on the input type: SUM(v) where v is of type double will return a result of type double; if v is an integral numeric type such as int, the result will be of type long.
Several aggregators can be used in the same SELECT clause and will be applied on the groups.
Aggregators cannot be nested.
Syntax
Aggregate_Expression := Aggregator '(' ['DISTINCT'] expression ')'.
Remarks
DISTINCT
Every aggregator can take an optional DISTINCT qualifier that will de-duplicate the values in the group before performing the aggregation. The data type of the values will have to be comparable if DISTINCT is being used. DISTINCT is not allowed when the aggregator is being used in an OVER expression.Aggregator
Can be either the invocation of a user-defined aggregator or a built-in aggregator:
Syntax
Aggregator :=
User_Defined_Aggregator | Built_In_Aggregator.
U-SQL provides the following built-in aggregation functions (follow the links for more information):
Syntax
Built_In_Aggregator :=
'ANY_VALUE' | 'ARRAY_AGG' | 'AVG' | 'COUNT' | 'MAP_AGG' | 'MAX' | 'MIN' | 'SUM' | 'STDEV' | 'STDEVP' | 'VAR' | 'VARP'.
expression
The C# expression (including column references) that gets aggregated. The expression cannot contain other aggregators. Some aggregators may support additional options besides