CROSS JOIN (U-SQL)

Summary

A cross join returns the Cartesian product of rows from the rowsets in the join. In other words, it will combine each row from the first rowset with each row from the second rowset.

Note that this is potentially an expensive and dangerous operation since it can lead to a large data explosion. It is best used in scenarios where a normal join cannot be used and very selective predicates are being used in the WHERE clause to limit the number of produced rows.

Examples

  • The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
  • The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.

Basic Example
Using the input rowsets

EmpName DepID
Rafferty 31
Jones 33
Heisenberg 33
Robinson 34
Smith 34
Williams null
DeptID DepName
31 Sales
33 Engineering
34 Clerical
35 Marketing

the following cross join

@employees = SELECT *  
               FROM (VALUES   
                      ("Rafferty", (int?) 31)  
                    , ("Jones", (int?) 33)  
                    , ("Heisenberg", (int?) 33)  
                    , ("Robinson", (int?) 34)  
                    , ("Smith", (int?) 34)  
                    , ("Williams", (int?) null)) AS E(EmpName, DepID);  
                      
@departments = SELECT *  
                FROM (VALUES  
                       ((int) 31, "Sales")  
                     , ((int) 33, "Engineering")  
                     , ((int) 34, "Clerical")  
                     , ((int) 35, "Marketing")) AS D(DepID, DepName);  
                       
@rs_cross =   
    SELECT e.EmpName, d.DepName  
    FROM @employees AS e   
         CROSS JOIN @departments AS d  
    WHERE d.DepName == "Engineering";  
  
OUTPUT @rs_cross   
TO "/Output/ReferenceGuide/Joins/CrossJoins/ExampleA.csv"
USING Outputters.Csv();  

produces this rowset

EmpName DepName
Rafferty Engineering
Jones Engineering
Heisenberg Engineering
Robinson Engineering
Smith Engineering
Williams Engineering

Note that without the filter on the DepName, the resulting rowset would have produced 6x4=24 rows.

Additional Examples

@right = 
    SELECT * FROM 
        ( VALUES
        (0, 99, "Ford Motor Company", "Ford"),
        (100, 199, "Ford Motor Company", "Lincoln"),
        (200, 299, "Ford Motor Company", "Motorcraft"),
        (300, 399, "General Motors Company", "Buick"),
        (400, 499, "General Motors Company", "Chevrolet"),
        (500, 599, "General Motors Company", "Cadillac"),
        (600, 699, "General Motors Company", "GMC"),
        (700, 799, "Fiat Chrysler", "Chrysler"),
        (800, 899, "Fiat Chrysler", "Dodge"),
        (900, 999, "Fiat Chrysler", "Jeep"),
        (1000, 1099, "Fiat Chrysler", "Ram"),
        (1100, 1199, "Fiat Chrysler", "Mopar"),
        (1200, 1299, "Fiat Chrysler", "SRT")
        ) AS T(carIDstart, carIDend, Automaker, Division);

@left = 
    SELECT * FROM 
        ( VALUES
        (3, "Mustang", "1964"),
        (7, "Fiesta", "1976"),
        (133, "Navigator", "1998"),
        (160, "Continental", "2017"),
        (639, "Canyon", "2004"),
        (801, "Challenger", "2008"),
        (802, "Charger", "2006")
        ) AS T(carID, currentModel, introYear);

@result = 
    SELECT  l.currentModel,
            r.Division,
            l.introYear
    FROM    @left AS l
    CROSS JOIN    @right AS r
    WHERE   l.carID BETWEEN r.carIDstart AND r.carIDend;

OUTPUT @result
TO "/Output/ReferenceGuide/Joins/CrossJoins/ExampleB.csv"
USING Outputters.Csv(outputHeader: true);

See Also