U-SQL Identifiers

Summary

U-SQL identifiers are used to identify the objects in the U-SQL metadata context. They can be composed of up to three parts parts:

Syntax

Identifier :=                                                                                            
     DB_Name '.' [Schema_Name] '.' Object_Name
|    Schema_Name '.' Object_Name
|    Object_Name.

The first part in a three part name identifies the database name, the second part refers to the schema name, or defaults to the database’s default schema named dbo if omitted and the third part identifies the object name such as a table or view. If the database name is left out, the current database context is implied, if the database name and the schema name is left out, then the current database and schema context is implied.

Objects that are scoped to the database, such as a schema, assembly or data source may be referred to by a two-part name:

Syntax

DB_Object_Identifier :=                                                                                  
     [DB_Name '.'] Object_Name.

Each name is either a quoted or unquoted U-SQL simple identifier:

Syntax

DB_Name :=                                                                                               
     Quoted_or_Unquoted_Identifier.
Schema_Name := Quoted_or_Unquoted_Identifier.
Object_Name := Quoted_or_Unquoted_Identifier.
Quoted_or_Unquoted_Identifier := Quoted_Identifier | Unquoted_Identifier.

The unquoted identifiers in U-SQL are any valid names in accordance to the Unicode Standard Annex 31, except that underscore is allowed as an initial character (as is traditional in the C programming language), as long as they are not in conflict with reserved keywords. The grammar for an unquoted identifier is:

Syntax

Unquoted_Identifier :=                                                                                   
     Identifier_Start_Character {Identifier_Part_Character}.
Identifier_Start_Character := letter_character | '_'. // the underscore character U+005F
Identifier_Part_Character := letter_character | decimal_digit_character | connecting_character | combining_character | formatting_character.
  • letter_character
    Any Unicode 5.0 character of the classes Lu, Ll, Lt, Lm, Lo, or Nl.

  • combining_character
    Any Unicode 5.0 character of classes Mn or Mc.

  • decimal_digit_character
    Any Unicode 5.0 character of the class Nd.

  • connecting_character
    Any Unicode 5.0 character of the class Pc.

  • formatting_character
    Any Unicode 5.0 character of the class Cf.

The list of reserved keywords is comprised of:

  1. Any all-uppercase keyword of length greater than 2. Note that if the keyword contains a non-letter character such as an _, it is no longer considered a reserved a keyword.
  2. Any C# version 5.0 reserved keyword.
  3. Any of the following keywords: AS, BY, DO, IF, IN, IS, ON, OR, TO.

In order to allow arbitrary identifiers that are either keywords or contain any Unicode code point, U-SQL allows to quote identifiers using '[' and ']', e.g., [MYTABLE]. Note that the quoting characters are not part of the name and if the closing quote appear inside the name it needs to be escaped by being doubled:

Syntax

Quoted_Identifier :=                                                                                     
     '[' (any_unicode_cp_less_quotes | Escaped_Quote)   
        {any_unicode_cp_less_quotes | Escaped_Quote} ']'.
Escaped_Quote := ']]'.

Furthermore, any U-SQL identifier, whether quoted or unquoted has to following the following rules:

  1. A U-SQL identifier is case-sensitive.
  2. Maximal length of a quoted or unquoted identifier is 128 characters (although some named objects may have further restrictions).
  3. Unicode escape sequences are NOT supported in identifiers.

Examples

A. The following script uses quoted identifiers to create some objects with uncommon names:

DROP DATABASE IF EXISTS [^];  
CREATE DATABASE [^];  
USE DATABASE [^];  
CREATE SCHEMA []]];  
USE SCHEMA []]];  
CREATE TABLE [*] (  
    []]] int, INDEX idx CLUSTERED ([]]]) DISTRIBUTED BY ROUND ROBIN INTO 5  
);

B. The following script reads from the generated table and outputs the result:

@r = 
    SELECT * FROM [^].[]]].[*];  
    OUTPUT @r TO "/output/test.csv" USING Outputters.Csv();

See Also