ARRAY_AGG (U-SQL)
Summary
The ARRAY_AGG aggregator creates a new SQL.ARRAY value per group that will contain the values of group as its items. ARRAY_AGG is not preserving order of values inside a group. If an array needs to be ordered, a LINQ OrderBy can be used. ARRAY_AGG and EXPLODE are conceptually inverse operations.
The identity value is null.
Syntax
ARRAY_AGG_Expression := 'ARRAY_AGG' '(' ['DISTINCT'] expression ')'.
Remarks
DISTINCT
Optionally allows to de-duplicate the values returned by the expression inside the group before aggregation.expression
The C# expression (including column references) that gets aggregated. The type of the expression has to be a built-in U-SQL type, including SQL.MAP or SQL.ARRAY.
Return Type
SQL.ARRAY<T> where T is the type of the input expression.
Usage in Windowing Expression
This aggregator cannot be used in a windowing expression.
Examples
- The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
- The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
A. ARRAY_AGG
This example aggregates the phone numbers as one group per employee.
@employees =
SELECT * FROM
( VALUES
("Noah", "cell:030-0074321"),
("Noah", "office:030-0076545"),
("Sophia", "cell:(5) 555-4729"),
("Sophia", "office:(5) 555-3745"),
("Liam", "cell:(5) 555-3932"),
("Amy", "cell:(171) 555-7788"),
("Amy", "office:(171) 555-6750"),
("Amy", "home:(425) 555-6238"),
("Justin", "cell:0921-12 34 65"),
("Justin", "office:0921-12 34 67"),
("Emma", (string)null),
("Jacob", ""),
("Olivia", "cell:88.60.15.31"),
("Olivia", "office:88.60.15.32"),
("Mason", "cell:(91) 555 22 82"),
("Mason", "office:(91) 555 91 99"),
("Mason", "home:(425) 555-2819"),
("Ava", "cell:91.24.45.40"),
("Ava", "office:91.24.45.41"),
("Ethan", "cell:(604) 555-4729"),
("Ethan", "office:(604) 555-3745"),
("David", "cell:(171) 555-1212"),
("Andrew", "cell:(1) 135-5555"),
("Andrew", "office:(1) 135-4892"),
("Jennie", "cell:(5) 555-3392"),
("Jennie", "office:(5) 555-7293")
) AS T(EmpName, PhoneNumber);
@result =
SELECT EmpName,
string.Join(", ", ARRAY_AGG(PhoneNumber)) AS PhoneNumbers
FROM @employees
WHERE !string.IsNullOrEmpty(PhoneNumber)
GROUP BY EmpName;
OUTPUT @result
TO "/Output/ReferenceGuide/Aggregate/array_agg/exampleA.csv"
USING Outputters.Csv();
B. EXPLODE - bonus example
This example does not use ARRAY_AGG
; however, it illustrates how to reverse the outcome from the above example using EXPLODE.
@result =
SELECT EmpName,
new SQL.ARRAY<string>(PhoneNumbers.Split(',')) AS PhoneNumbersArray
FROM @result;
@exploded =
SELECT EmpName,
PhoneNumber.Trim() AS PhoneNumber
FROM @result
CROSS APPLY
EXPLODE(PhoneNumbersArray) AS r(PhoneNumber);
OUTPUT @exploded
TO "/Output/ReferenceGuide/Aggregate/array_agg/exampleB.csv"
USING Outputters.Csv();
C. ARRAY_AGG - additional example
This example deals with a common many-to-many relationship. Here, a film can have more than one producer and you would like to have all producers listed in one record. The following query will aggregate all producers.
This example provides an alternative solution to Using User Defined Aggregator - genericAggregator B.
@films =
SELECT * FROM
( VALUES
(1, "A Good Year"),
(2, "American Gangster"),
(3, "Robin Hood"),
(4, "The Counselor")
) AS T(FilmID, Title);
@producers =
SELECT * FROM
( VALUES
(1, "Ridley Scott"),
(2, "Brian Grazer"),
(3, "Russell Crowe"),
(4, "Nick Wechsler"),
(5, "Steve Schwartz"),
(6, "Paula Mae Schwartz")
) AS T(ProducerID, Producer);
@films_producers =
SELECT * FROM
( VALUES
(1, 1),
(2, 1),
(2, 2),
(3, 1),
(3, 2),
(3, 3),
(4, 1),
(4, 4),
(4, 5),
(4, 6)
) AS T(FilmID, ProducerID);
@result =
SELECT f.Title,
COUNT( * ) AS ProducerCount,
string.Join(", ", ARRAY_AGG(p.Producer)) AS Producers
FROM @films AS f
JOIN
@films_producers AS fp
ON f.FilmID == fp.FilmID
JOIN
@producers AS p
ON p.ProducerID == fp.ProducerID
GROUP BY f.Title;
OUTPUT @result
TO "/Output/ReferenceGuide/Aggregate/array_agg/exampleC.csv"
USING Outputters.Csv();