EXPLODE (U-SQL)
Summary
The EXPLODE rowset expression accepts an expression or value of either type SQL.ARRAY, SQL.MAP or IEnumerable and unpacks (explodes) the values into a rowset.
If EXPLODE is applied on an instance of SQL.ARRAY <T>, the resulting rowset contains a single column of type T where each item in the array is placed into its own row. If the array value was empty or null, then the resulting rowset is empty.
If EXPLODE is applied on an instance of SQL.MAP <K,V>, the resulting rowset contains two columns of type K and V respectively where each key-value pair in the map is placed into its own row. If the map value was empty or null, then the resulting rowset is empty.
Syntax
Explode_Expression := 'EXPLODE' '(' (sqlmap_expression | sqlarray_expression | ienumerable_expression ) ')' Derived_Table_Alias_With_Opt_Types.
Remarks
sqlmap_expression | sqlarray_expression
Any U-SQL/C# expression that results in a value of either typeSQL.MAP
orSQL.ARRAY
respectively. If the provided expression does not resolve in either type, an error is raised.ienumerable_expression
A C# expression returning a value of either typeIEnumerable<T>
,IEnumerable<KeyValuePair<K,V>>
, orIEnumerable<Tuple>
. The expression normally refers to at least one of the columns from theRowset_Source
.Derived_Table_Alias_With_Opt_Types
Defines the rowset schema for the result of the EXPLODE.
Syntax
Derived_Table_Alias_With_Opt_Types := 'AS' Quoted_or_Unquoted_Identifier ['(' Column_Alias_Opt_Type_List ')'].
Column_Alias_Opt_Type_List := Column_Alias_Opt_Type {',' Column_Alias_Opt_Type }.
Column_Alias_Opt_Type := Quoted_or_Unquoted_Identifier [Built_in_Type].
The derived table alias requires the correct number of columns to be specified (one in the case of an EXPLODE
on SQL.ARRAY
and two in the case of SQL.MAP
). Optionally a type can be specified, but if it is specified it has to be the exact type of the item type in the case of an array (T
for SQL.ARRAY<T>
) or the key and value types in the case of a map (K
and V
for SQL.MAP<K,V>
).
Note
EXPLODE is syntactically only supported in the context of an APPLY and is not a general rowset expression.
Examples
@dep_employees = SELECT *
FROM (VALUES
("Sales", "Rafferty")
, ("Engineering", "Jones,Heisenberg")
, ("Clerical", "Robinson,Smith,Johnson")
, ("Marketing", "")
, ("", "Williams")
) AS DE(DepName, DepEmp);
@dep_employees_as_array =
SELECT DepName, new SQL.ARRAY<string>(DepEmp.Split(',')) AS Employees
FROM @dep_employees;
@deps_exploded =
SELECT de.DepName, emp
FROM @dep_employees_as_array AS de
CROSS APPLY
EXPLODE(de.Employees) AS dp(emp);
OUTPUT @deps_exploded
TO "/output/rsExplode.csv"
USING Outputters.Csv();
The resulting rowset looks like:
DepName | EmpName |
---|---|
"Sales" | "Rafferty" |
"Engineering" | "Jones" |
"Engineering" | "Heisenbert" |
"Clerical" | "Robinson" |
"Clerical" | "Smith" |
"Clerical" | "Johnson" |
"Marketing" | "" |
"" | "Williams" |
Additional Examples
- The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
- The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
- For simplicity, the example(s) with user-defined code make use of Code-Behind for assembly management. The main advantage of Code-Behind is that the tooling will register the assembly file and add the REFERENCE ASSEMBLY statement automatically. To use Assembly registration instead of Code-Behind, see Using Assemblies: Code-Behind vs. Assembly Registration Walkthrough.
- Some of the examples below are based on the dataset defined below. Ensure your execution includes the rowset variable.
Dataset
@employees =
SELECT * FROM
( VALUES
(1, "Noah", 100, (int?)10000, new DateTime(2012,05,31), "cell:030-0074321,office:030-0076545"),
(2, "Sophia", 100, (int?)15000, new DateTime(2012,03,19), "cell:(5) 555-4729,office:(5) 555-3745"),
(3, "Liam", 100, (int?)30000, new DateTime(2014,09,14), "cell:(5) 555-3932"),
(4, "Amy", 100, (int?)35000, new DateTime(1999,02,27), "cell:(171) 555-7788,office:(171) 555-6750, home:(425) 555-6238"),
(5, "Justin", 100, (int?)15000, new DateTime(2015,01,12), "cell:0921-12 34 65,office:0921-12 34 67"),
(6, "Emma", 200, (int?)8000, new DateTime(2014,03,08), (string)null),
(7, "Jacob", 200, (int?)8000, new DateTime(2014,09,02), ""),
(8, "Olivia", 200, (int?)8000, new DateTime(2013,12,11), "cell:88.60.15.31,office:88.60.15.32"),
(9, "Mason", 300, (int?)50000, new DateTime(2016,01,01), "cell:(91) 555 22 82,office:(91) 555 91 99, home:(425) 555-2819"),
(10, "Ava", 400, (int?)15000, new DateTime(2014,09,14), "cell:91.24.45.40,office:91.24.45.41"),
(11, "Ethan", 400, (int?)9000, new DateTime(2015,08,22), "cell:(604) 555-4729,office:(604) 555-3745"),
(12, "David", 800, (int?)100, new DateTime(2016,11,01), "cell:(171) 555-1212"),
(13, "Andrew", 100, (int?)null, new DateTime(1995,07,16), "cell:(1) 135-5555,office:(1) 135-4892"),
(14, "Jennie", 100, (int?)34000, new DateTime(2000,02,12), "cell:(5) 555-3392,office:(5) 555-7293")
) AS T(EmpID, EmpName, DeptID, Salary, StartDate, PhoneNumbers);
SQL.ARRAY
This example first converts the PhoneNumbers
values into the SQL.ARRAY
type AS PhoneNumbersArray
. PhoneNumbersArray
is then unpacked resulting in a single column where each item in the array is placed into its own row.
@array =
SELECT EmpName,
PhoneNumbers == "" ? null : new SQL.ARRAY<string>(PhoneNumbers.Split(',')) AS PhoneNumbersArray
FROM @employees
WHERE PhoneNumbers != null;
@exploded =
SELECT EmpName,
PhoneNumber.Trim() AS PhoneNumber
FROM @array
CROSS APPLY
EXPLODE(PhoneNumbersArray) AS r(PhoneNumber);
OUTPUT @exploded
TO "/Output/ReferenceGuide/StatementsAndExpressions/Explode/exampleA.csv"
USING Outputters.Csv();
ARRAY_AGG - Bonus Example
This example does not use CROSS APPLY
or EXPLODE
; however, it illustrates how to reverse the outcome from the above example using ARRAY_AGG.
@result =
SELECT EmpName,
string.Join(", ", ARRAY_AGG(PhoneNumber)) AS PhoneNumbers
FROM @exploded
GROUP BY EmpName;
OUTPUT @result
TO "/Output/ReferenceGuide/StatementsAndExpressions/Explode/exampleB.csv"
USING Outputters.Csv();
SQL.MAP
This example first converts the PhoneNumbers
values into the SQL.MAP
type AS PhoneNumberMap
. PhoneNumberMap
is then unpacked resulting in two columns where each key-value pair in the map is placed into its own row.
@map =
SELECT EmpName,
PhoneNumbers == ""? null : new SQL.MAP<string, string>(from p in PhoneNumbers.Split(',') select new KeyValuePair<string, string>(p.Split(':') [0], p.Split(':') [1])) AS PhoneNumberMap
FROM @employees
WHERE PhoneNumbers != null;
@exploded =
SELECT EmpName,
r.key.Trim() AS PhoneType, r.value AS PhoneNumber
FROM @map
CROSS APPLY
EXPLODE(PhoneNumberMap) AS r(key, value);
OUTPUT @exploded
TO "/Output/ReferenceGuide/StatementsAndExpressions/Explode/exampleC.csv"
USING Outputters.Csv();
MAP_AGG - Bonus Example
This example does not use CROSS APPLY
or EXPLODE
; however, it illustrates how to reverse the outcome from the above example using MAP_AGG.
@result =
SELECT EmpName,
String.Join(",", MAP_AGG(PhoneType, PhoneNumber).Select(p => String.Format("{0}:{1}", p.Key, p.Value))).Trim() AS PhoneNumbers
FROM @exploded
GROUP BY EmpName;
OUTPUT @result
TO "/Output/ReferenceGuide/StatementsAndExpressions/Explode/exampleD.csv"
USING Outputters.Csv();
MAP<string, MAP<int, string>>
// MAP<string, MAP<int, string>>
DECLARE @aMap SQL.MAP<int, string> = new SQL.MAP<int, string>{{1, "Seattle"}};
// Illustrating two methods to pass Value
@popularCitiesM =
SELECT * FROM
( VALUES
(new SQL.MAP<string, SQL.MAP<int, string>>{{"Washington", @aMap}}),
(new SQL.MAP<string, SQL.MAP<int, string>>{{"Oregon", new SQL.MAP<int, string>{{2, "Portland"}}}})
) AS T(cityMaps);
@result2 =
SELECT c.key AS State,
n.key AS id,
n.value AS City
FROM @popularCitiesM
CROSS APPLY
EXPLODE(cityMaps) AS c(key, value)
CROSS APPLY
EXPLODE(c.value) AS n(key, value);
OUTPUT @result2
TO "ReferenceGuide/DML/From/Explode/nestedMAP.txt"
USING Outputters.Text();
MAP<string, ARRAY<string>>
DECLARE @anArray SQL.ARRAY<string> = new SQL.ARRAY<string>{"Seattle", "Spokane", "Tacoma"};
// Illustrating two methods to pass Value
@popularCitiesA =
SELECT * FROM
( VALUES
(new SQL.MAP<string, SQL.ARRAY<string>>{{"Washington", @anArray}}),
(new SQL.MAP<string, SQL.ARRAY<string>>{{"Oregon", new SQL.ARRAY<string>{"Portland", "Eugene", "Salem"}}})
) AS T(cityMaps);
@result =
SELECT c.key AS State,
n.value AS City
FROM @popularCitiesA
CROSS APPLY
EXPLODE(cityMaps) AS c(key, value)
CROSS APPLY
EXPLODE(c.value) AS n(value);
OUTPUT @result
TO "ReferenceGuide/DML/From/Explode/nestedArray.txt"
USING Outputters.Text();
ARRAY<MAP<int, string>>
DECLARE @aMap1 SQL.MAP<int, string> = new SQL.MAP<int, string>{{503, "Portland"}};
DECLARE @aMap2 SQL.MAP<int, string> = new SQL.MAP<int, string>{{541, "Eugene"}};
DECLARE @aMap3 SQL.MAP<int, string> = new SQL.MAP<int, string>{{602, "Phoenix"}, {520, "Tucson"}, {480, "Mesa"}};
// Illustrating various methods to pass Value
@data =
SELECT * FROM
( VALUES
("AK", new SQL.ARRAY<SQL.MAP<int, string>>{new SQL.MAP<int, string>{ {907, "Anchorage"}} , new SQL.MAP<int, string>{ {250, "Hyder"}} }),
("AZ", new SQL.ARRAY<SQL.MAP<int, string>>{@aMap3}),
("OR", new SQL.ARRAY<SQL.MAP<int, string>>{@aMap1, @aMap2}),
("WA", new SQL.ARRAY<SQL.MAP<int, string>>{new SQL.MAP<int, string>{ {253, "Tacoma"}, {360, "Olympia"}, {425, "Redmond"} } })
) AS T(State, ac_cities);
// Flatten results
@result1 =
SELECT State,
n.key AS areaCode,
n.value AS City
FROM @data
CROSS APPLY EXPLODE(ac_cities) AS c(value)
CROSS APPLY EXPLODE(c.value) AS n(key, value);
// return data as one comma delimited string per State
@result2 =
SELECT State,
string.Join(",", MAP_AGG(n.key, n.value)) AS data1
FROM @data
CROSS APPLY EXPLODE(ac_cities) AS c(value)
CROSS APPLY EXPLODE(c.value) AS n(key, value)
GROUP BY State;
// compare count results
@result3a =
SELECT State,
ac_cities.Count AS Count
FROM @data;
@result3b =
SELECT State,
c.value.Count AS Count
FROM @data
CROSS APPLY EXPLODE(ac_cities) AS c(value);
@result3c =
SELECT State,
SUM(c.value.Count) AS Count
FROM @data
CROSS APPLY EXPLODE(ac_cities) AS c(value)
GROUP BY State;
OUTPUT @result1
TO "ReferenceGuide/DML/From/Explode/mapInArray/result1.txt"
ORDER BY State
USING Outputters.Text();
OUTPUT @result2
TO "ReferenceGuide/DML/From/Explode/mapInArray/result2.txt"
USING Outputters.Text();
OUTPUT @result3a
TO "ReferenceGuide/DML/From/Explode/mapInArray/result3a.txt"
ORDER BY State
USING Outputters.Text();
OUTPUT @result3b
TO "ReferenceGuide/DML/From/Explode/mapInArray/result3b.txt"
ORDER BY State
USING Outputters.Text();
OUTPUT @result3c
TO "ReferenceGuide/DML/From/Explode/mapInArray/result3c.txt"
ORDER BY State
USING Outputters.Text();
ARRAY<ARRAY<string>>
DECLARE @anArrayB1 SqlArray<string> = SqlArray.Create(new [] {"The Girl with the Dragon Tattoo", "Stieg Larsson", "2005"});
DECLARE @anArrayB2 SQL.ARRAY<string> = new SQL.ARRAY<string>{"The Book Thief", "Markus Zusak", "2005"};
DECLARE @anArrayB3 SQL.ARRAY<string> = new SQL.ARRAY<string>{"The Silver Linings Playbook", "Matthew Quick", "2008"};
DECLARE @anArrayB4 SQL.ARRAY<string> = new SQL.ARRAY<string>{"Sarah's Key", "Tatiana de Rosnay"};
DECLARE @anArrayF1 SQL.ARRAY<string> = new SQL.ARRAY<string>{"The Counselor", "Ridley Scott", "2013"};
DECLARE @anArrayF2 SQL.ARRAY<string> = new SQL.ARRAY<string>{"Goodfellas", "Martin Scorsese", "1990"};
DECLARE @anArrayF3 SQL.ARRAY<string> = new SQL.ARRAY<string>{"The Book Thief", "Brian Percival", "2013"};
@publications =
SELECT * FROM
( VALUES
("Book", new SQL.ARRAY<SQL.ARRAY<string>>{@anArrayB1, @anArrayB2, @anArrayB3, @anArrayB4}),
("Film", new SQL.ARRAY<SQL.ARRAY<string>>{@anArrayF1, @anArrayF2, @anArrayF3})
) AS T(medium, data);
// Flatten results
@result1 =
SELECT medium,
n.value AS BookInfo
FROM @publications
CROSS APPLY EXPLODE(data) AS c(value)
CROSS APPLY EXPLODE(c.value) AS n(value);
// Filtering results by index position
@result2 =
SELECT medium,
c.value[0] AS Title
FROM @publications
CROSS APPLY EXPLODE(data) AS c(value);
// return data as one comma delimited string per medium
@result3 =
SELECT medium,
string.Join(",", c.value) AS data1
FROM @publications
CROSS APPLY EXPLODE(data) AS c(value);
// number of items in top array
@result4a =
SELECT medium,
data.Count AS CityCount1
FROM @publications;
// number of items in each nested array
@result4b =
SELECT medium,
c.value.Count AS Title
FROM @publications
CROSS APPLY EXPLODE(data) AS c(value);
OUTPUT @result1
TO "ReferenceGuide/DML/From/Explode/arrayInArray/result1.txt"
USING Outputters.Text();
OUTPUT @result2
TO "ReferenceGuide/DML/From/Explode/arrayInArray/result2.txt"
USING Outputters.Text();
OUTPUT @result3
TO "ReferenceGuide/DML/From/Explode/arrayInArray/result3.txt"
USING Outputters.Text();
OUTPUT @result4a
TO "ReferenceGuide/DML/From/Explode/arrayInArray/result4a.txt"
USING Outputters.Text();
OUTPUT @result4b
TO "ReferenceGuide/DML/From/Explode/arrayInArray/result4b.txt"
USING Outputters.Text();
IEnumerable<T>
@bands =
SELECT *
FROM (VALUES ("Beatles", "George Harrison, John Lennon, Paul McCartney, Ringo Starr"),
("Creedence Clearwater Revival", "Doug Clifford, John Fogerty, Stu Cook, Tom Fogerty"),
("Eagles", "Don Henley, Glenn Frey, Joe Walsh, Timothy Schmit"),
("Pink Floyd", "David Gilmour, Nick Mason, Richard Wright, Roger Watters, Syd Barrett"),
("Rolling Stones", "Charlie Watts, Keith Richards, Mick Jagger, Ronnie Wood")) AS Bands(name, members);
@ca_ie_T1 =
SELECT name, member
FROM @bands CROSS APPLY EXPLODE (members.Split(',')) AS T(member);
@ca_ie_T2 =
SELECT name, member
FROM @bands CROSS APPLY EXPLODE (from m in members.Split(',') select m) AS T(member);
OUTPUT @ca_ie_T1
TO "/output/ReferenceGuide/DML/Explode/crossapply_IE_T1.csv"
USING Outputters.Csv();
OUTPUT @ca_ie_T2
TO "/output/ReferenceGuide/DML/Explode/crossapply_IE_T2.csv"
USING Outputters.Csv();
IEnumerable<T> - Additional Example, Part 1
c# code is placed in the associated Code-Behind .cs file. See usage in next section, below.
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace FileOps
{
public class myDirectories
{
public static IEnumerable<string> ProcessDirectory1(string targetDirectory)
{
return Directory.EnumerateFiles(targetDirectory);
}
public static IEnumerable<string> ProcessDirectory2(string targetDirectory, string excludedFile)
{
var files = from file in
Directory.EnumerateFiles(targetDirectory)
where !file.Contains(excludedFile)
select file;
return files;
}
}
};
IEnumerable<T> - Additional Example, Part 2
Example is meant to be executed locally. Using Code-Behind from previous section, above.
// The Directory.EnumerateFiles method returns an enumerable collection of file names.
@directories =
SELECT * FROM
( VALUES
(@"C:\Temp"),
(@"C:\Temp\Temp2")
) AS T(directory);
// aliasing namespace and class
USING xx = FileOps.myDirectories;
DECLARE @excludedFile string = "Test2.txt";
// Calling EnumerateFiles inline
@result1 =
SELECT file
FROM @directories CROSS APPLY EXPLODE (Directory.EnumerateFiles(directory)) AS T(file);
OUTPUT @result1
TO "/ReferenceGuide/DML/FROM/Explode/Test1.txt"
USING Outputters.Tsv();
// Calling EnumerateFiles through a function
@result2 =
SELECT file
FROM @directories CROSS APPLY EXPLODE (ProcessDirectory1(directory)) AS T(file);
OUTPUT @result2
TO "/ReferenceGuide/DML/FROM/Explode/Test2.txt"
USING Outputters.Tsv();
//excluding a file in the WHERE clause
@result3 =
SELECT file
FROM @directories CROSS APPLY EXPLODE (xx.ProcessDirectory1(directory)) AS T(file)
// WHERE !file.Contains(@excludedFile);
WHERE file != @"C:\Temp\Test2.txt";
OUTPUT @result3
TO "/ReferenceGuide/DML/FROM/Explode/Test3.txt"
USING Outputters.Tsv();
//excluding a file through the function
@result4 =
SELECT file
FROM @directories CROSS APPLY EXPLODE (xx.ProcessDirectory2(directory, @excludedFile)) AS T(file);
OUTPUT @result4
TO "/ReferenceGuide/DML/FROM/Explode/Test4.txt"
USING Outputters.Tsv();
IEnumerable<KeyValuePair<K,V>>
@bands =
SELECT *
FROM (VALUES ("Beatles", "George Harrison, John Lennon, Paul McCartney, Ringo Starr"),
("Creedence Clearwater Revival", "Doug Clifford, John Fogerty, Stu Cook, Tom Fogerty"),
("Eagles", "Don Henley, Glenn Frey, Joe Walsh, Timothy Schmit"),
("Pink Floyd", "David Gilmour, Nick Mason, Richard Wright, Roger Watters, Syd Barrett"),
("Rolling Stones", "Charlie Watts, Keith Richards, Mick Jagger, Ronnie Wood")) AS Bands(name, members);
@ca_ie_kvp =
SELECT T.*
FROM @bands CROSS APPLY EXPLODE (from m in members.Split(',') select new KeyValuePair<string,string>(name,m)) AS T(k,v);
OUTPUT @ca_ie_kvp
TO "/output/ReferenceGuide/DML/Explode/crossapply_ie_kvp.csv"
USING Outputters.Csv();
IEnumerable<KeyValuePair<K,V>> - Additional Example, Part 1
c# code is placed in the associated Code-Behind .cs file. See usage in next section, below.
using System.Collections.Generic;
using Microsoft.Win32;
// Returns names and values for a given registry key
namespace RegOps
{
public class myReg
{
public static IEnumerable<KeyValuePair<string, string>> regReader(string registryKey)
{
RegistryKey rk = Registry.LocalMachine.OpenSubKey(registryKey);
var list = new List<KeyValuePair<string, string>>();
foreach (string valueName in rk.GetValueNames())
{
list.Add(new KeyValuePair<string, string>(valueName, rk.GetValue(valueName).ToString()));
}
return list;
}
}
};
IEnumerable<KeyValuePair<K,V>> - Additional Example, Part 2
Example is meant to be executed locally. Using Code-Behind from previous section, above.
// Returns names and values for a given registry key
@registry =
SELECT * FROM
( VALUES
(@"SYSTEM\CurrentControlSet\Control\CrashControl")
) AS T(registryKey);
// aliasing namespace and class
USING xx = RegOps.myReg;
@result =
SELECT T.key AS Name,
T.value AS Value
FROM @registry
CROSS APPLY
EXPLODE(xx.regReader(registryKey)) AS T(key, value);
OUTPUT @result
TO "/ReferenceGuide/DML/FROM/Explode/keyValuePair1.txt"
USING Outputters.Tsv(outputHeader: true);
IEnumerable<Tuple>
@bands =
SELECT *
FROM (VALUES ("Beatles", "George Harrison, John Lennon, Paul McCartney, Ringo Starr"),
("Creedence Clearwater Revival", "Doug Clifford, John Fogerty, Stu Cook, Tom Fogerty"),
("Eagles", "Don Henley, Glenn Frey, Joe Walsh, Timothy Schmit"),
("Pink Floyd", "David Gilmour, Nick Mason, Richard Wright, Roger Watters, Syd Barrett"),
("Rolling Stones", "Charlie Watts, Keith Richards, Mick Jagger, Ronnie Wood")) AS Bands(name, members);
@ca_tuple =
SELECT name, m1, m2, m3, m4
FROM @bands CROSS APPLY EXPLODE (new [] {Tuple.Create(members.Split(',')[0],
members.Split(',')[1],
members.Split(',')[2],
members.Split(',')[3] )}) AS T(m1,m2,m3,m4);
OUTPUT @ca_tuple
TO "/output/ReferenceGuide/DML/Explode/crossapply_tuple.csv"
USING Outputters.Csv();