ARRAY_AGG (U-SQL)

Summary

The ARRAY_AGG aggregator creates a new SQL.ARRAY value per group that will contain the values of group as its items. ARRAY_AGG is not preserving order of values inside a group. If an array needs to be ordered, a LINQ OrderBy can be used. ARRAY_AGG and EXPLODE are conceptually inverse operations.

The identity value is null.

Syntax

ARRAY_AGG_Expression :=                                                                                  
      'ARRAY_AGG' '(' ['DISTINCT'] expression ')'.

Remarks

  • DISTINCT
    Optionally allows to de-duplicate the values returned by the expression inside the group before aggregation.

  • expression
    The C# expression (including column references) that gets aggregated. The type of the expression has to be a built-in U-SQL type, including SQL.MAP or SQL.ARRAY.

Return Type

SQL.ARRAY<T> where T is the type of the input expression.

Usage in Windowing Expression

This aggregator cannot be used in a windowing expression.

Examples

  • The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
  • The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.

A. ARRAY_AGG
This example aggregates the phone numbers as one group per employee.

@employees = 
    SELECT * FROM 
        ( VALUES
        ("Noah",   "cell:030-0074321"),
        ("Noah",   "office:030-0076545"),
        ("Sophia", "cell:(5) 555-4729"),
        ("Sophia", "office:(5) 555-3745"),
        ("Liam",   "cell:(5) 555-3932"),
        ("Amy",    "cell:(171) 555-7788"),
        ("Amy",    "office:(171) 555-6750"), 
        ("Amy",    "home:(425) 555-6238"),
        ("Justin", "cell:0921-12 34 65"),
        ("Justin", "office:0921-12 34 67"),
        ("Emma",   (string)null),
        ("Jacob",  ""),
        ("Olivia", "cell:88.60.15.31"),
        ("Olivia", "office:88.60.15.32"),
        ("Mason",  "cell:(91) 555 22 82"),
        ("Mason",  "office:(91) 555 91 99"), 
        ("Mason",  "home:(425) 555-2819"),
        ("Ava",    "cell:91.24.45.40"),
        ("Ava",    "office:91.24.45.41"),
        ("Ethan",  "cell:(604) 555-4729"),
        ("Ethan",  "office:(604) 555-3745"),
        ("David",  "cell:(171) 555-1212"),
        ("Andrew", "cell:(1) 135-5555"),
        ("Andrew", "office:(1) 135-4892"),
        ("Jennie", "cell:(5) 555-3392"),
        ("Jennie", "office:(5) 555-7293")
        ) AS T(EmpName, PhoneNumber);
        
@result =
    SELECT EmpName,
           string.Join(", ", ARRAY_AGG(PhoneNumber)) AS PhoneNumbers
    FROM @employees
    WHERE !string.IsNullOrEmpty(PhoneNumber)
    GROUP BY EmpName;

OUTPUT @result
TO "/Output/ReferenceGuide/Aggregate/array_agg/exampleA.csv"
USING Outputters.Csv();

B. EXPLODE - bonus example
This example does not use ARRAY_AGG; however, it illustrates how to reverse the outcome from the above example using EXPLODE.

@result =
    SELECT EmpName,
           new SQL.ARRAY<string>(PhoneNumbers.Split(',')) AS PhoneNumbersArray
    FROM @result;

@exploded =
    SELECT EmpName,
           PhoneNumber.Trim() AS PhoneNumber
    FROM @result
    CROSS APPLY 
    EXPLODE(PhoneNumbersArray) AS r(PhoneNumber);

    OUTPUT @exploded
    TO "/Output/ReferenceGuide/Aggregate/array_agg/exampleB.csv"
    USING Outputters.Csv();

C. ARRAY_AGG - additional example
This example deals with a common many-to-many relationship. Here, a film can have more than one producer and you would like to have all producers listed in one record. The following query will aggregate all producers.
This example provides an alternative solution to Using User Defined Aggregator - genericAggregator B.

@films = 
    SELECT * FROM 
        ( VALUES
        (1, "A Good Year"),
        (2, "American Gangster"),
        (3, "Robin Hood"),
        (4, "The Counselor")
        ) AS T(FilmID, Title);

@producers = 
    SELECT * FROM 
        ( VALUES
        (1, "Ridley Scott"),
        (2, "Brian Grazer"),
        (3, "Russell Crowe"),
        (4, "Nick Wechsler"),
        (5, "Steve Schwartz"),
        (6, "Paula Mae Schwartz")
        ) AS T(ProducerID, Producer);

@films_producers = 
    SELECT * FROM 
        ( VALUES
        (1, 1),
        (2, 1),
        (2, 2),
        (3, 1),
        (3, 2),
        (3, 3),
        (4, 1),
        (4, 4),
        (4, 5),
        (4, 6)
        ) AS T(FilmID, ProducerID);

@result =
    SELECT f.Title,
           COUNT( * ) AS ProducerCount,
           string.Join(", ", ARRAY_AGG(p.Producer)) AS Producers
    FROM @films AS f
         JOIN
             @films_producers AS fp
         ON f.FilmID == fp.FilmID
         JOIN
             @producers AS p
         ON p.ProducerID == fp.ProducerID
    GROUP BY f.Title;

OUTPUT @result
TO "/Output/ReferenceGuide/Aggregate/array_agg/exampleC.csv"
USING Outputters.Csv();

See Also