ALTER TABLE (U-SQL)

Summary

Modifies a table definition by adding, or dropping columns. Adding or dropping a column from a table is a meta data operation only and its performance will not be impacted by the size of the table.

Syntax

Alter_Table_Statement :=                                                          
    'ALTER' 'TABLE' Identifier   
    ( ['PARTIAL'] 'REBUILD' [Partition_Label_List]
    | 'ADD' 'COLUMN' Column_Definition_List     
    | 'DROP' 'COLUMN' Identifier_List ).  

Remarks

  • Identifier
    Identifies the table to be modified. If the Identifier is a three-part identifier, the table from the specified database and schema will be chosen. If the Identifier is a two-part identifier, then the table of the given schema and of the given name of the current static database context is chosen. If the identifier is a simple identifier, then the table of the given name in the current static database and schema context is chosen.

    If the specified table does not exist, is an external table, or the user does not have permissions to modify the table, an error is raised.

  • REBUILD
    Used to compact partitions and tables that have grown by multiple, incremental insertions into the same partitions in order to improve query performance over such tables. If the table is empty or has only been loaded once, the operation will succeed without the need to perform a compaction.

    Sometimes, a REBUILD may take too long to succeed or even lead to a job failure. In that case, it is recommended that you use the PARTIAL REBUILD clause to rebuild the table or partitions partially. This will compact up to a number of fragments specified by the system flag @@MaxTableFileReadThreshold which per default is set to 3000 (it is recommended to only change this value if the default value is not providing success). By repeated partial rebuilds, a table or partition can be safely rebuilt completely.

    A specific table or partition can only be rebuilt once per script, otherwise a compile-time error is raised.

  • ADD | DROP
    If an already existing column is attempted to be added or a non-existent column is attempted to be dropped, an error is raised.

  • Column_Definition_List
    The name(s) and built-in USQL type of the column(s) to be added. If the added column is of a nullable type, existing rows will contain null in the added column. If the added column is of a not-nullable type, then the column will contain the type's default value (e.g., 0 for type int).

  • Identifier_List
    The name(s) of the column(s) to be dropped. A column cannot be dropped when it is:

    • Used in an index.
    • Used in a partition.
  • Partition_Label_List
    The partition label list specifies the list of partition buckets to be added or deleted by specifying the literal values for the partition columns using the appropriate types. The values have to be provided as constants or as scalar static variables.

Examples

  • The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
  • The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
  • The examples below are based on the table defintion below.

Test Table

CREATE DATABASE IF NOT EXISTS TestReferenceDB;
USE DATABASE TestReferenceDB; 

DROP TABLE IF EXISTS Logs;
CREATE TABLE Logs
(
    date DateTime, 
    eventType int, 
    eventTime DateTime, 
    INDEX Index_EventType CLUSTERED (eventType ASC) 
    PARTITIONED BY (eventTime)
    DISTRIBUTED BY HASH(eventType) INTO 3    
);

// Add partitions if not exists
DECLARE @partition1 DateTime = new DateTime(2017, 01, 01, 00,00,00,00, DateTimeKind.Utc);
DECLARE @partition2 DateTime = @partition1.AddDays(1);
DECLARE @partition3 DateTime = @partition1.AddDays(2);

// Alternate method to check for existing partition
IF (!PARTITION.EXISTS(TestReferenceDB.Logs, @partition1))
THEN
    ALTER TABLE TestReferenceDB.Logs ADD PARTITION (@partition1);
END;

//Insert a few records
INSERT INTO TestReferenceDB.Logs
(date, eventType, eventTime)
PARTITION(@partition1)
VALUES
(new DateTime(2017,01,01), 1, new DateTime(2017,01,01));

INSERT INTO TestReferenceDB.Logs
(date, eventType, eventTime)
PARTITION(@partition2)
VALUES
(new DateTime(2017,01,02), 1, new DateTime(2017,01,02));

INSERT INTO TestReferenceDB.Logs
(date, eventType, eventTime)
PARTITION(@partition3)
VALUES
(new DateTime(2017,01,03), 1, new DateTime(2017,01,03));

Table Rebuild

  • Full Rebuild

    Full rebuild the table

    ALTER TABLE TestReferenceDB.Logs REBUILD;
    

    Or Full rebuild the table by partition

    ALTER TABLE TestReferenceDB.Logs REBUILD PARTITION (@partition1);
    

    Or Full rebuild the table by multiple partitions

    ALTER TABLE TestReferenceDB.Logs REBUILD PARTITION (@partition1), PARTITION (@partition2), PARTITION (@partition3);
    
  • Partial Rebuild

    SET @@MaxTableFileReadThreshold = 1000;  //Set @@MaxTableFileReadThreshold to 1000, and the value is 3000 by default
    

    Partial rebuild the table

    ALTER TABLE TestReferenceDB.Logs PARTIAL REBUILD;
    

    Or Partial rebuild the table by partition

    ALTER TABLE TestReferenceDB.Logs PARTIAL REBUILD PARTITION (@partition1);
    

    Or Partial rebuild the table by multiple partitions

    ALTER TABLE TestReferenceDB.Logs PARTIAL REBUILD PARTITION (@partition1), PARTITION (@partition2), PARTITION (@partition3);
    

Various Column Scenarios

USE DATABASE TestReferenceDB;

// Add a column
ALTER TABLE Logs ADD COLUMN eventName string;

// add another column
ALTER TABLE Logs ADD COLUMN result int;

// drop a column and add another one
ALTER TABLE Logs DROP COLUMN result;
ALTER TABLE Logs ADD COLUMN clientId string;

// drop a column and add 3 more columns
ALTER TABLE Logs DROP COLUMN clientId;
ALTER TABLE Logs ADD COLUMN result string, clientId string, payload int?;

// drop 2 columns
ALTER TABLE Logs DROP COLUMN clientId, result;

See Also