U-SQL Using APPLY with an Applier UDO

Summary

U-SQL’s user-defined operators (UDOs) provide the ability to write your own custom Applier in C#. The following provides the syntax of the applier expression as you call it in an APPLY.

Syntax

Applier_Expression :=                                                                                    
    ['USING'] applier_type_expression Derived_Table_Alias_With_Types
    [Readonly_Clause] [Required_Clause].

Remarks

  • applier_type_expression
    The applier typed expression creates an Applier UDO and in addition to the input row data, takes the derived table schema definition as its input as well as the provided clauses. Unlike EXPLODE (or in general rowset expressions) an Applier uses the UDO programming model to get access to the incoming row and to return zero to many rows as a result. If the expression is not resulting in an instance of an IApplier, an error is raised. The result will be typed as specified in the derived table schema.

    For more information on the UDO programming model see U-SQL Programmability Guide: User-Defined Applier.

  • Derived_Table_Alias_With_Types
    Because the UDO model requires knowledge of the column data types, the derived table schema specification requires the column data types to be specified.

Syntax

  Derived_Table_Alias_With_Types :=                                                                   
       'AS' Quoted_or_Unquoted_Identifier '(' Column_Definition_List ')'.
  
  • Readonly_Clause
    The optional READONLY clause can help the UDO programmer to write more efficient code. The optional READONLY clause specifies that either all columns (if specified with *) or the specified columns are read only for the Applier and will be passed through to the output using either the same name or the specified column name in parenthesis.

Syntax

  Readonly_Clause :=                                                                                  
       'READONLY' Star_Or_Readonly_Column_List.
Star_Or_Readonly_Column_List := '*' | Readonly_Column_List.
Readonly_Column_List := Readonly_Column { ',' Readonly_Column }.
Readonly_Column := Column_Identifier [Output_Column_Dependency_Alias].
Output_Column_Dependency_Alias := '(' Quoted_or_Unquoted_Identifier ')'.
  • Required_Clause
    The optional REQUIRED clause can help the UDO programmer to write more efficient code. The optional REQUIRED clause specifies that either all columns are required on input for the Applier (if specified with *) or the specified columns are required. If a specified column is followed by a list of columns in parenthesis, then the input column is only required if the columns in that list are referenced from the output.

Syntax

  Required_Clause :=                                                                                  
       'REQUIRED' Star_Or_Required_Column_List.
Star_Or_Required_Column_List := '*' | Required_Column_List.
Required_Column_List := Required_Column { ',' Required_Column}.
Required_Column := Column_Identifier [Required_Output_Column_Dependency_List].
Required_Output_Column_Dependency_List := '(' Identifier_List ')'.

Examples

ParserApplier
c# code is placed in the associated Code-Behind .cs file.

using Microsoft.Analytics.Interfaces;
using Microsoft.Analytics.Types.Sql;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;

namespace ReferenceGuide_Examples
{
    [SqlUserDefinedApplier]
    public class ParserApplier : IApplier
    {
        private string parsingPart;

        public ParserApplier(string parsingPart)
        {
            if (parsingPart.ToUpper().Contains("ALL")
                || parsingPart.ToUpper().Contains("FORMALNAME")
                || parsingPart.ToUpper().Contains("ISOALPHA3CODE")
                || parsingPart.ToUpper().Contains("LATESTRECORDEDPOPULATION")
                || parsingPart.ToUpper().Contains("REGION")
                || parsingPart.ToUpper().Contains("SUBREGION")
                )
            {
                this.parsingPart = parsingPart;
            }
            else
            {
                throw new ArgumentException("Incorrect parameter. Please use: 'ALL[FORMALNAME|ISOALPHA3CODE|LATESTRECORDEDPOPULATION|REGION|SUBREGION]'");
            }
        }

        public override IEnumerable<IRow> Apply(IRow input, IUpdatableRow output)
        {
            string[] properties = input.Get<string>("Properties").Split(',');

            //  only process with correct number of properties
            if (properties.Count() == 5)
            {
                string FormalName = properties[0];
                string IsoAlpha3Code = properties[1];
                int LatestRecordedPopulation = -1;
                string Region = properties[3];
                string Subregion = properties[4];

                // Only return LatestRecordedPopulation if it is number, otherwise, -1
                if (!int.TryParse(properties[2], out LatestRecordedPopulation))
                {
                    LatestRecordedPopulation = -1;
                }

                if (parsingPart.ToUpper().Contains("FORMALNAME") || parsingPart.ToUpper().Contains("ALL")) output.Set<string>("FormalName", FormalName);
                if (parsingPart.ToUpper().Contains("ISOALPHA3CODE") || parsingPart.ToUpper().Contains("ALL")) output.Set<string>("IsoAlpha3Code", IsoAlpha3Code);
                if (parsingPart.ToUpper().Contains("LATESTRECORDEDPOPULATION") || parsingPart.ToUpper().Contains("ALL")) output.Set<int>("LatestRecordedPopulation", LatestRecordedPopulation);
                if (parsingPart.ToUpper().Equals("REGION") || parsingPart.ToUpper().Contains("ALL")) output.Set<string>("Region", Region);
                if (parsingPart.ToUpper().Contains("SUBREGION") || parsingPart.ToUpper().Contains("ALL")) output.Set<string>("Subregion", Subregion);
            }
            yield return output.AsReadOnly();
        }
    }
}

Using ParserApplier
The user-defined Applier acts as a comma-delimited value parser for the country properties. Using Code-Behind from previous section, above.

@countries = 
    SELECT * FROM 
        ( VALUES
        (1, "Afghanistan", "Islamic State of Afghanistan,AFG,28400000,Asia,Southern Asia"),
        (3, "Albania", "Republic of Albania,ALB,3785031,Europe,Southern Europe"),
        (4, "Algeria", "People's Democratic Republic of Algeria,DZA,34178188,Africa,Northern Africa"),
        (6, "Andorra", "Principality of Andorra,AND,87243,Europe,Southern Europe"),
        (7, "Angola", "People's Republic of Angola,AGO,12799293,Africa,Middle Africa"),
        (10, "Antigua and Barb.", "Antigua and Barbuda,ATG,85632,Americas,Caribbean"),
        (11, "Argentina", "Argentine Republic,ARG,42550127,Americas,South America"),
        (12, "Armenia", "Republic of Armenia,ARM,2967004,Asia,Western Asia"),
        (15, "Australia", "Commonwealth of Australia,AUS,22997671,Oceania,Australia and New Zealand"),
        (16, "Austria", "Republic of Austria,AUT,2,Europe,Western Europe")
        ) AS T(CountryID, CountryName, Properties);

@result =
    SELECT
        c.CountryID,
        c.CountryName,
        Properties.FormalName,
        Properties.IsoAlpha3Code,
        Properties.LatestRecordedPopulation,
        Properties.Region,
        Properties.Subregion
    FROM @countries AS c
    CROSS APPLY
        new ReferenceGuide_Examples.ParserApplier ("all") AS Properties(
            FormalName string, 
            IsoAlpha3Code string, 
            LatestRecordedPopulation int, 
            Region string, 
            Subregion string);

OUTPUT @result
TO "/Output/ReferenceGuide/StatementsAndExpressions/Appplier/exampleA.txt"
USING Outputters.Text();

U-SQL's CROSS/OUTER APPLY with VALUES
The VALUES clause allows to apply each row of the constructed rowset to be correlated to each row in the rowset source.

@bands = 
  SELECT * 
  FROM (VALUES ("Beatles", "George Harrison, John Lennon, Paul McCartney, Ringo Starr"), 
               ("Creedence Clearwater Revival", "Doug Clifford, John Fogerty, Stu Cook, Tom Fogerty"), 
               ("Eagles", "Don Henley, Glenn Frey, Joe Walsh, Timothy Schmit"), 
               ("Pink Floyd", "David Gilmour, Nick Mason, Richard Wright, Roger Watters, Syd Barrett"), 
               ("Rolling Stones", "Charlie Watts, Keith Richards, Mick Jagger, Ronnie Wood")) AS Bands(name, members);

@ca_val1 = SELECT * FROM @bands CROSS APPLY VALUES (1) AS T(x);
@ca_val2 = SELECT * FROM @bands CROSS APPLY VALUES (1),(name.Length) AS T(x);
@ca_val3 = SELECT * FROM @bands CROSS APPLY VALUES (1,name.Length,3) AS T(x,y,z);

OUTPUT @ca_val1 
TO "/output/ReferenceGuide/DML/Appplier/crossapply_value1.csv" 
USING Outputters.Csv();

OUTPUT @ca_val2 
TO "/output/ReferenceGuide/DML/Appplier/crossapply_value2.csv" 
USING Outputters.Csv();

OUTPUT @ca_val3 
TO "/output/ReferenceGuide/DML/Appplier/crossapply_value3.csv" 
USING Outputters.Csv();

See Also