Query acceleration SQL language reference
Query acceleration supports an ANSI SQL-like language for expressing queries over blob contents. The query acceleration SQL dialect is a subset of ANSI SQL, with a limited set of supported data types, operators, etc., but it also expands on ANSI SQL to support queries over hierarchical semi-structured data formats such as JSON.
SELECT Syntax
The only SQL statement supported by query acceleration is the SELECT statement. This example returns every row for which expression returns true.
SELECT * FROM table [WHERE expression] [LIMIT limit]
For CSV-formatted data, table must be BlobStorage
. This means that the query will run against whichever blob was specified in the REST call. For JSON-formatted data, table is a "table descriptor." See the Table Descriptors section of this article.
In the following example, for each row for which the WHERE expression returns true, this statement will return a new row that is made from evaluating each of the projection expressions.
SELECT expression [, expression ...] FROM table [WHERE expression] [LIMIT limit]
You can specify one or more specific columns as part of the SELECT expression (for example, SELECT Title, Author, ISBN
).
Note
The maximum number of specific columns that you can use in the SELECT expression is 49. If you need your SELECT statement to return more than 49 columns, then use a wildcard character (*
) for the SELECT expression (For example: SELECT *
).
The following example returns an aggregate computation (For example: the average value of a particular column) over each of the rows for which expression returns true.
SELECT aggregate_expression FROM table [WHERE expression] [LIMIT limit]
The following example returns suitable offsets for splitting a CSV-formatted blob. See the Sys.Split section of this article.
SELECT sys.split(split_size)FROM BlobStorage
Data Types
Data Type | Description |
---|---|
INT | 64-bit signed integer. |
FLOAT | 64-bit ("double-precision") floating point. |
STRING | Variable-length Unicode string. |
TIMESTAMP | A point in time. |
BOOLEAN | True or false. |
When reading values from CSV-formatted data, all values are read as strings. String values may be converted to other types using CAST expressions. Values may be implicitly cast to other types depending on context. for more info, see Data type precedence (Transact-SQL).
Expressions
Referencing fields
For JSON-formatted data, or CSV-formatted data with a header row, fields may be referenced by name. Field names can be quoted or unquoted. Quoted field names are enclosed in double-quote characters ("
), may contain spaces, and are case-sensitive. Unquoted field names are case-insensitive, and may not contain any special characters.
In CSV-formatted data, fields may also be referenced by ordinal, prefixed with an underscore (_
) character. For example, the first field may be referenced as _1
, or the eleventh field may be referenced as _11
. Referencing fields by ordinal is useful for CSV-formatted data that does not contain a header row, in which case the only way to reference a particular field is by ordinal.
Operators
The following standard SQL operators are supported:
Operator | Description |
---|---|
= |
Compares the equality of two expressions (a comparison operator). |
!= |
Tests whether one expression is not equal to another expression (a comparison operator). |
<> |
Compares two expressions for not equal to (a comparison operator). |
< |
Compares two expressions for lesser than (a comparison operator). |
<= |
Compares two expressions for lesser than or equal (a comparison operator). |
> |
Compares two expressions for greater than (a comparison operator). |
>= |
Compares two expressions for greater than or equal (a comparison operator). |
+ |
Adds two numbers. This addition arithmetic operator can also add a number, in days, to a date. |
- |
Subtracts two numbers (an arithmetic subtraction operator). |
/ |
Divides one number by another (an arithmetic division operator). |
* |
Multiplies two expressions (an arithmetic multiplication operator). |
% |
Returns the remainder of one number divided by another. |
AND |
Performs a bitwise logical AND operation between two integer values. |
OR |
Performs a bitwise logical OR operation between two specified integer values as translated to binary expressions within Transact-SQL statements. |
NOT |
Negates a Boolean input. |
CAST |
Converts an expression of one data type to another. |
BETWEEN |
Specifies a range to test. |
IN |
Determines whether a specified value matches any value in a subquery or a list. |
NULLIF |
Returns a null value if the two specified expressions are equal. |
COALESCE |
Evaluates the arguments in order and returns the current value of the first expression that initially doesn't evaluate to NULL. |
If data types on the left and right of an operator are different, then automatic conversion will be performed according to the rules specified here: Data type precedence (Transact-SQL).
The query acceleration SQL language supports only a very small subset of the data types discussed in that article. See the Data Types section of this article.
Casts
The query acceleration SQL language supports the CAST operator, according to the rules here: Data type conversion (Database Engine).
The query acceleration SQL language supports only a tiny subset of the data types discussed in that article. See the Data Types section of this article.
String functions
The query acceleration SQL language supports the following standard SQL string functions:
Function | Description |
---|---|
CHAR_LENGTH | Returns the length in characters of the string expression, if the string expression is of a character data type; otherwise, returns the length in bytes of the string expression (the smallest integer not less than the number of bits divided by 8). (This function is the same as the CHARACTER_LENGTH function.) |
CHARACTER_LENGTH | Returns the length in characters of the string expression, if the string expression is of a character data type; otherwise, returns the length in bytes of the string expression (the smallest integer not less than the number of bits divided by 8). (This function is the same as the CHAR_LENGTH function |
LOWER | Returns a character expression after converting uppercase character data to lowercase. |
UPPER | Returns a character expression with lowercase character data converted to uppercase. |
SUBSTRING | Returns part of a character, binary, text, or image expression in SQL Server. |
TRIM | Removes the space character char(32) or other specified characters from the start and end of a string. |
LEADING | Removes the space character char(32) or other specified characters from the start of a string. |
TRAILING | Removes the space character char(32) or other specified characters from the end of a string. |
Here's a few examples:
Function | Example | Result |
---|---|---|
CHARACTER_LENGTH | SELECT CHARACTER_LENGTH('abcdefg') from BlobStorage |
7 |
CHAR_LENGTH | SELECT CHAR_LENGTH(_1) from BlobStorage |
1 |
LOWER | SELECT LOWER('AbCdEfG') from BlobStorage |
abcdefg |
UPPER | SELECT UPPER('AbCdEfG') from BlobStorage |
ABCDEFG |
SUBSTRING | SUBSTRING('123456789', 1, 5) |
23456 |
TRIM | TRIM(BOTH '123' FROM '1112211Microsoft22211122') |
Microsoft |
Date functions
The following standard SQL date functions are supported:
DATE_ADD
DATE_DIFF
EXTRACT
TO_STRING
TO_TIMESTAMP
Currently, all date formats of standard IS08601 are converted.
DATE_ADD function
The query acceleration SQL language supports year, month, day, hour, minute, second for the DATE_ADD
function.
Examples:
DATE_ADD(datepart, quantity, timestamp)
DATE_ADD('minute', 1, CAST('2017-01-02T03:04:05.006Z' AS TIMESTAMP)
DATE_DIFF function
The query acceleration SQL language supports year, month, day, hour, minute, second for the DATE_DIFF
function.
DATE_DIFF(datepart, timestamp, timestamp)
DATE_DIFF('hour','2018-11-09T00:00+05:30','2018-11-09T01:00:23-08:00')
EXTRACT function
For EXTRACT other than date part supported for the DATE_ADD
function, the query acceleration SQL language supports timezone_hour and timezone_minute as date part.
Examples:
EXTRACT(datepart FROM timestampstring)
EXTRACT(YEAR FROM '2010-01-01T')
TO_STRING function
Examples:
TO_STRING(TimeStamp , format)
TO_STRING(CAST('1969-07-20T20:18Z' AS TIMESTAMP), 'MMMM d, y')
This table describes strings that you can use to specify the output format of the TO_STRING
function.
Format string | Output |
---|---|
yy | Year in 2 digit format - 1999 as '99' |
y | Year in 4 digit format |
yyyy | Year in 4 digit format |
M | Month of year - 1 |
MM | Zero padded Month - 01 |
MMM | Abbr. month of Year - JAN |
MMMM | Full month - May |
d | Day of month (1-31) |
dd | Zero padded day of Month (01-31) |
a | AM or PM |
h | Hour of day (1-12) |
hh | Zero padded Hours od day (01-12) |
H | Hour of day (0-23) |
HH | Zero Padded hour of Day (00-23) |
m | Minute of hour (0-59) |
mm | Zero padded minute (00-59) |
s | Second of Minutes (0-59) |
ss | Zero padded Seconds (00-59) |
S | Fraction of Seconds (0.1-0.9) |
SS | Fraction of Seconds (0.01-0.99) |
SSS | Fraction of Seconds (0.001-0.999) |
X | Offset in Hours |
XX or XXXX | Offset in hours and minutes (+0430) |
XXX or XXXXX | Offset in hours and minutes (-07:00) |
x | Offset in hours (7) |
xx or xxxx | Offset in hour and minute (+0530) |
Xxx or xxxxx | Offset in hour and minute (+05:30) |
TO_TIMESTAMP function
Only IS08601 formats are supported.
Examples:
TO_TIMESTAMP(string)
TO_TIMESTAMP('2007T')
Note
You can also use the UTCNOW
function to get the system time.
Aggregate Expressions
A SELECT statement may contain either one or more projection expressions or a single aggregate expression. The following aggregate expressions are supported:
Expression | Description |
---|---|
COUNT(*) | Returns the number of records which matched the predicate expression. |
COUNT(expression) | Returns the number of records for which expression is non-null. |
AVG(expression) | Returns the average of the non-null values of expression. |
MIN(expression) | Returns the minimum non-null value of expression. |
MAX(expression | Returns the maximum non-null value of expression. |
SUM(expression) | Returns the sum of all non-null values of expression. |
MISSING
The IS MISSING
operator is the only non-standard that the query acceleration SQL language supports. For JSON data, if a field is missing from a particular input record, the expression field IS MISSING
will evaluate to the Boolean value true.
Table Descriptors
For CSV data, the table name is always BlobStorage
. For example:
SELECT * FROM BlobStorage
For JSON data, additional options are available:
SELECT * FROM BlobStorage[*].path
This allows queries over subsets of the JSON data.
For JSON queries, you can mention the path in part of the FROM clause. These paths will help to parse the subset of JSON data. These paths can reference to JSON Array and Object values.
Let's take an example to understand this in more detail.
This is our sample data:
{
"id": 1,
"name": "mouse",
"price": 12.5,
"tags": [
"wireless",
"accessory"
],
"dimensions": {
"length": 3,
"width": 2,
"height": 2
},
"weight": 0.2,
"warehouses": [
{
"latitude": 41.8,
"longitude": -87.6
}
]
}
You might be interested only in the warehouses
JSON object from the above data. The warehouses
object is a JSON array type, so you can mention this in the FROM clause. Your sample query can look something like this.
SELECT latitude FROM BlobStorage[*].warehouses[*]
The query gets all fields but selects only the latitude.
If you wanted to access only the dimensions
JSON object value, you could use refer to that object in your query. For example:
SELECT length FROM BlobStorage[*].dimensions
This also limits your access to members of the dimensions
object. If you want to access other members of JSON fields and inner values of JSON objects, then you might use a queries such as shown in the following example:
SELECT weight,warehouses[0].longitude,id,tags[1] FROM BlobStorage[*]
Note
BlobStorage and BlobStorage[*] both refer to the whole object. However, if you have a path in the FROM clause, then you'll need to use BlobStorage[*].path
Sys.Split
This is a special form of the SELECT statement, which is available only for CSV-formatted data.
SELECT sys.split(split_size) FROM BlobStorage
Use this statement in cases where you want to download and then process CSV data records in batches. That way you can process records in parallel instead of having to download all records at one time. This statement doesn't return records from the CSV file. Instead, it returns a collection of batch sizes. You can then use each batch size to retrieve a batch of data records.
Use the split_size parameter to specify the number of bytes that you want each batch to contain. For example, if you want to process only 10 MB of data at a time, you're statement would look like this: SELECT sys.split(10485760)FROM BlobStorage
because 10 MB is equal to 10,485,760 bytes. Each batch will contain as many records as can fit into those 10 MB.
In most cases, the size of each batch will be slightly higher than the number that you specify. That's because a batch cannot contain a partial record. If the last record in a batch starts before the end of your threshold, the batch will be larger so that it can contain the complete record. The size of the last batch will likely be smaller than the size that you specify.
Note
The split_size must be at least 10 MB (10485760).