ALTER TABLE (U-SQL)
Summary
Modifies a table definition by adding, or dropping columns. Adding or dropping a column from a table is a meta data operation only and its performance will not be impacted by the size of the table.
Syntax
Alter_Table_Statement := 'ALTER' 'TABLE' Identifier ( ['PARTIAL'] 'REBUILD' [Partition_Label_List] | 'ADD' 'COLUMN' Column_Definition_List | 'DROP' 'COLUMN' Identifier_List ).
Remarks
Identifier
Identifies the table to be modified. If the Identifier is a three-part identifier, the table from the specified database and schema will be chosen. If the Identifier is a two-part identifier, then the table of the given schema and of the given name of the current static database context is chosen. If the identifier is a simple identifier, then the table of the given name in the current static database and schema context is chosen.If the specified table does not exist, is an external table, or the user does not have permissions to modify the table, an error is raised.
REBUILD
Used to compact partitions and tables that have grown by multiple, incremental insertions into the same partitions in order to improve query performance over such tables. If the table is empty or has only been loaded once, the operation will succeed without the need to perform a compaction.
Sometimes, a REBUILD may take too long to succeed or even lead to a job failure. In that case, it is recommended that you use the PARTIAL REBUILD clause to rebuild the table or partitions partially. This will compact up to a number of fragments specified by the system flag @@MaxTableFileReadThreshold which per default is set to 3000 (it is recommended to only change this value if the default value is not providing success). By repeated partial rebuilds, a table or partition can be safely rebuilt completely.
A specific table or partition can only be rebuilt once per script, otherwise a compile-time error is raised.ADD | DROP
If an already existing column is attempted to be added or a non-existent column is attempted to be dropped, an error is raised.Column_Definition_List
The name(s) and built-in USQL type of the column(s) to be added. If the added column is of a nullable type, existing rows will contain null in the added column. If the added column is of a not-nullable type, then the column will contain the type's default value (e.g., 0 for type int).Identifier_List
The name(s) of the column(s) to be dropped. A column cannot be dropped when it is:- Used in an index.
- Used in a partition.
Partition_Label_List
The partition label list specifies the list of partition buckets to be added or deleted by specifying the literal values for the partition columns using the appropriate types. The values have to be provided as constants or as scalar static variables.
Examples
- The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
- The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
- The examples below are based on the table defintion below.
Test Table
CREATE DATABASE IF NOT EXISTS TestReferenceDB;
USE DATABASE TestReferenceDB;
DROP TABLE IF EXISTS Logs;
CREATE TABLE Logs
(
date DateTime,
eventType int,
eventTime DateTime,
INDEX Index_EventType CLUSTERED (eventType ASC)
PARTITIONED BY (eventTime)
DISTRIBUTED BY HASH(eventType) INTO 3
);
// Add partitions if not exists
DECLARE @partition1 DateTime = new DateTime(2017, 01, 01, 00,00,00,00, DateTimeKind.Utc);
DECLARE @partition2 DateTime = @partition1.AddDays(1);
DECLARE @partition3 DateTime = @partition1.AddDays(2);
// Alternate method to check for existing partition
IF (!PARTITION.EXISTS(TestReferenceDB.Logs, @partition1))
THEN
ALTER TABLE TestReferenceDB.Logs ADD PARTITION (@partition1);
END;
//Insert a few records
INSERT INTO TestReferenceDB.Logs
(date, eventType, eventTime)
PARTITION(@partition1)
VALUES
(new DateTime(2017,01,01), 1, new DateTime(2017,01,01));
INSERT INTO TestReferenceDB.Logs
(date, eventType, eventTime)
PARTITION(@partition2)
VALUES
(new DateTime(2017,01,02), 1, new DateTime(2017,01,02));
INSERT INTO TestReferenceDB.Logs
(date, eventType, eventTime)
PARTITION(@partition3)
VALUES
(new DateTime(2017,01,03), 1, new DateTime(2017,01,03));
Table Rebuild
Full Rebuild
Full rebuild the table
ALTER TABLE TestReferenceDB.Logs REBUILD;
Or Full rebuild the table by partition
ALTER TABLE TestReferenceDB.Logs REBUILD PARTITION (@partition1);
Or Full rebuild the table by multiple partitions
ALTER TABLE TestReferenceDB.Logs REBUILD PARTITION (@partition1), PARTITION (@partition2), PARTITION (@partition3);
Partial Rebuild
SET @@MaxTableFileReadThreshold = 1000; //Set @@MaxTableFileReadThreshold to 1000, and the value is 3000 by default
Partial rebuild the table
ALTER TABLE TestReferenceDB.Logs PARTIAL REBUILD;
Or Partial rebuild the table by partition
ALTER TABLE TestReferenceDB.Logs PARTIAL REBUILD PARTITION (@partition1);
Or Partial rebuild the table by multiple partitions
ALTER TABLE TestReferenceDB.Logs PARTIAL REBUILD PARTITION (@partition1), PARTITION (@partition2), PARTITION (@partition3);
Various Column Scenarios
USE DATABASE TestReferenceDB;
// Add a column
ALTER TABLE Logs ADD COLUMN eventName string;
// add another column
ALTER TABLE Logs ADD COLUMN result int;
// drop a column and add another one
ALTER TABLE Logs DROP COLUMN result;
ALTER TABLE Logs ADD COLUMN clientId string;
// drop a column and add 3 more columns
ALTER TABLE Logs DROP COLUMN clientId;
ALTER TABLE Logs ADD COLUMN result string, clientId string, payload int?;
// drop 2 columns
ALTER TABLE Logs DROP COLUMN clientId, result;