U-SQL Concepts

Summary

U-SQL combines some familiar concepts from a variety of languages: It is a declarative language like SQL, it follows a dataflow-like composition of statements and expressions like Pig and Cascading*,* and provides simple ways to extend the language with user-defined operators, user-defined aggregators and user-defined functions using C#, and provides a SQL database-like metadata object model to manage, discover and secure structured data and user-code.

How does a U-SQL Script process your Data

Currently, Azure Data Lake Analytics provides U-SQL for batch processing. Therefore U-SQL is written and executed in form of a batch script. It follows the following general processing pattern:

  1. Retrieve data from stored locations in rowset format

    1. Stored locations can be files that will be schematized on read with EXTRACT expressions
    2. Stored locations can be U-SQL tables that are stored in a schematized format
    3. Or can be tables provided by other data sources such as an Azure SQL database.
  2. Transform the rowset(s)

    1. Several transformations over the rowsets can be composed in a data flow format
  3. Store the transformed rowset data

    1. Store it in a file with an OUTPUT statement, or
    2. Store it in a U-SQL table with an INSERT statement

In addition, U-SQL also supports data definition statements such as CREATE TABLE to create metadata artifacts either in separate scripts or sometimes even in combination with the transformation scripts.

U-SQL Scripts can be submitted in a variety of ways. In particular you can submit them directly from within the Azure Data Lake Tools for Visual Studio, from the Azure Portal or programmatically via the Azure Data Lake SDK job submission API or the Azure Powershell extension's job submission command.

Please explore the following concepts introduced in this section: