Data types
Applies to: Databricks SQL Databricks Runtime
For rules governing how conflicts between data types are resolved, see SQL data type rules.
Supported data types
Azure Databricks supports the following data types:
Data Type | Description |
---|---|
BIGINT | Represents 8-byte signed integer numbers. |
BINARY | Represents byte sequence values. |
BOOLEAN | Represents Boolean values. |
DATE | Represents values comprising values of fields year, month and day, without a time-zone. |
DECIMAL(p,s) | Represents numbers with maximum precision p and fixed scale s . |
DOUBLE | Represents 8-byte double-precision floating point numbers. |
FLOAT | Represents 4-byte single-precision floating point numbers. |
INT | Represents 4-byte signed integer numbers. |
INTERVAL intervalQualifier | Represents intervals of time either on a scale of seconds or months. |
VOID | Represents the untyped NULL. |
SMALLINT | Represents 2-byte signed integer numbers. |
STRING | Represents character string values. |
TIMESTAMP | Represents values comprising values of fields year, month, day, hour, minute, and second, with the session local timezone. |
TIMESTAMP_NTZ | Represents values comprising values of fields year, month, day, hour, minute, and second. All operations are performed without taking any time zone into account. |
TINYINT | Represents 1-byte signed integer numbers. |
ARRAY < elementType > | Represents values comprising a sequence of elements with the type of elementType . |
MAP < keyType,valueType > | Represents values comprising a set of key-value pairs. |
STRUCT < [fieldName : fieldType [NOT NULL][COMMENT str][, …]] > | Represents values with the structure described by a sequence of fields. |
VARIANT | Represents semi-structured data. |
OBJECT | Represents values in a VARIANT with the structure described by a set of fields. |
Important
Delta Lake does not support the VOID
type.
Data type classification
Data types are grouped into the following classes:
- Integral numeric types represent whole numbers:
- Exact numeric types represent base-10 numbers:
- Binary floating point types use exponents and a binary representation to cover a large range of numbers:
- Numeric types represents all numeric data types:
- Date-time types represent date and time components:
- Simple types are types defined by holding singleton values:
- Complex types are composed of multiple components of complex or simple types:
Language mappings
Applies to: Databricks Runtime
Scala
Spark SQL data types are defined in the package org.apache.spark.sql.types
. You access them by importing the package:
import org.apache.spark.sql.types._
SQL type | Data type | Value type | API to access or create data type |
---|---|---|---|
TINYINT | ByteType | Byte | ByteType |
SMALLINT | ShortType | Short | ShortType |
INT | IntegerType | Int | IntegerType |
BIGINT | LongType | Long | LongType |
FLOAT | FloatType | Float | FloatType |
DOUBLE | DoubleType | Double | DoubleType |
DECIMAL(p,s) | DecimalType | java.math.BigDecimal | DecimalType |
STRING | StringType | String | StringType |
BINARY | BinaryType | Array[Byte] | BinaryType |
BOOLEAN | BooleanType | Boolean | BooleanType |
TIMESTAMP | TimestampType | java.sql.Timestamp | TimestampType |
TIMESTAMP_NTZ | TimestampNTZType | java.time.LocalDateTime | TimestampNTZType |
DATE | DateType | java.sql.Date | DateType |
year-month interval | YearMonthIntervalType | java.time.Period | YearMonthIntervalType (3) |
day-time interval | DayTimeIntervalType | java.time.Duration | DayTimeIntervalType (3) |
ARRAY | ArrayType | scala.collection.Seq | ArrayType(elementType [, containsNull]). (2) |
MAP | MapType | scala.collection.Map | MapType(keyType, valueType [, valueContainsNull]). (2) |
STRUCT | StructType | org.apache.spark.sql.Row | StructType(fields). fields is a Seq of StructField. 4. |
StructField | The value type of the data type of this field(For example, Int for a StructField with the data type IntegerType) | StructField(name, dataType [, nullable]). 4 | |
VARIANT | VariantType | org.apache.spark.unsafe.type.VariantVal | VariantType |
OBJECT | Not Supported | Not supported | Not supported |
Java
Spark SQL data types are defined in the package org.apache.spark.sql.types
. To access or create a data type, use factory methods provided in org.apache.spark.sql.types.DataTypes
.
SQL type | Data Type | Value type | API to access or create data type |
---|---|---|---|
TINYINT | ByteType | byte or Byte | DataTypes.ByteType |
SMALLINT | ShortType | short or Short | DataTypes.ShortType |
INT | IntegerType | int or Integer | DataTypes.IntegerType |
BIGINT | LongType | long or Long | DataTypes.LongType |
FLOAT | FloatType | float or Float | DataTypes.FloatType |
DOUBLE | DoubleType | double or Double | DataTypes.DoubleType |
DECIMAL(p,s) | DecimalType | java.math.BigDecimal | DataTypes.createDecimalType() DataTypes.createDecimalType(precision, scale). |
STRING | StringType | String | DataTypes.StringType |
BINARY | BinaryType | byte[] | DataTypes.BinaryType |
BOOLEAN | BooleanType | boolean or Boolean | DataTypes.BooleanType |
TIMESTAMP | TimestampType | java.sql.Timestamp | DataTypes.TimestampType |
TIMESTAMP_NTZ | TimestampNTZType | java.time.LocalDateTime | DataTypes.TimestampNTZType |
DATE | DateType | java.sql.Date | DataTypes.DateType |
year-month interval | YearMonthIntervalType | java.time.Period | YearMonthIntervalType (3) |
day-time interval | DayTimeIntervalType | java.time.Duration | DayTimeIntervalType (3) |
ARRAY | ArrayType | ava.util.List | DataTypes.createArrayType(elementType [, containsNull]).(2) |
MAP | MapType | java.util.Map | DataTypes.createMapType(keyType, valueType [, valueContainsNull]).(2) |
STRUCT | StructType | org.apache.spark.sql.Row | DataTypes.createStructType(fields). fields is a List or array of StructField. 4 |
StructField | The value type of the data type of this field (For example, int for a StructField with the data type IntegerType) | DataTypes.createStructField(name, dataType, nullable) 4 | |
VARIANT | VariantType | org.apache.spark.unsafe.type.VariantVal | VariantType |
OBJECT | Not Supported | Not supported | Not supported |
Python
Spark SQL data types are defined in the package pyspark.sql.types
. You access them by importing the package:
from pyspark.sql.types import *
SQL type | Data type | Value type | API to access or create data type |
---|---|---|---|
TINYINT | ByteType | int or long. (1) | ByteType() |
SMALLINT | ShortType | int or long. (1) | ShortType() |
INT | IntegerType | int or long | IntegerType() |
BIGINT | LongType | long (1) | LongType() |
FLOAT | FloatType | float (1) | FloatType() |
DOUBLE | DoubleType | float | DoubleType() |
DECIMAL(p,s) | DecimalType | decimal.Decimal | DecimalType() |
STRING | StringType | string | StringType() |
BINARY | BinaryType | bytearray | BinaryType() |
BOOLEAN | BooleanType | bool | BooleanType() |
TIMESTAMP | TimestampType | datetime.datetime | TimestampType() |
TIMESTAMP_NTZ | TimestampNTZType | datetime.datetime | TimestampNTZType() |
DATE | DateType | datetime.date | DateType() |
year-month interval | YearMonthIntervalType | Not supported | Not supported |
day-time interval | DayTimeIntervalType | datetime.timedelta | DayTimeIntervalType (3) |
ARRAY | ArrayType | list, tuple, or array | ArrayType(elementType, [containsNull]).(2) |
MAP | MapType | dict | MapType(keyType, valueType, [valueContainsNull]).(2) |
STRUCT | StructType | list or tuple | StructType(fields). field is a Seq of StructField. (4) |
StructField | The value type of the data type of this field (For example, Int for a StructField with the data type IntegerType) | StructField(name, dataType, [nullable]).(4) | |
VARIANT | VariantType | VariantVal | VariantType() |
OBJECT | Not Supported | Not supported | Not supported |
R
SQL type | Data type | Value type | API to access or create data type |
---|---|---|---|
TINYINT | ByteType | integer (1) | ‘byte’ |
SMALLINT | ShortType | integer (1) | ‘short’ |
INT | IntegerType | integer | ‘integer’ |
BIGINT | LongType | integer (1) | ‘long’ |
FLOAT | FloatType | numeric (1) | ‘float’ |
DOUBLE | DoubleType | numeric | ‘double’ |
DECIMAL(p,s) | DecimalType | Not supported | Not supported |
STRING | StringType | character | ‘string’ |
BINARY | BinaryType | raw | ‘binary’ |
BOOLEAN | BooleanType | logical | ‘bool’ |
TIMESTAMP | TimestampType | POSIXct | ‘timestamp’ |
TIMESTAMP_NTZ | TimestampNTZType | datetime.datetime | TimestampNTZType() |
DATE | DateType | Date | ‘date’ |
year-month interval | YearMonthIntervalType | Not supported | Not supported |
day-time interval | DayTimeIntervalType | Not supported | Not supported |
ARRAY | ArrayType | vector or list | list(type=’array’, elementType=elementType, containsNull=[containsNull]).(2) |
MAP | MapType | environment | list(type=’map’, keyType=keyType, valueType=valueType, valueContainsNull=[valueContainsNull]).(2) |
STRUCT | StructType | named list | list(type=’struct’, fields=fields). fields is a Seq of StructField. (4) |
StructField | The value type of the data type of this field (For example, integer for a StructField with the data type IntegerType) | list(name=name, type=dataType, nullable=[nullable]).(4) | |
VARIANT | Not Supported | Not supported | Not supported |
OBJECT | Not Supported | Not supported | Not supported |
(1) Numbers are converted to the domain at runtime. Make sure that numbers are within range.
(2) The optional value defaults to TRUE
.
(3) Interval types
YearMonthIntervalType([startField,] endField)
: Represents a year-month interval which is made up of a contiguous subset of the following fields:startField
is the leftmost field, andendField
is the rightmost field of the type. Valid values ofstartField
andendField
are0(MONTH)
and1(YEAR)
.DayTimeIntervalType([startField,] endField)
: Represents a day-time interval which is made up of a contiguous subset of the following fields:startField
is the leftmost field, andendField
is the rightmost field of the type. Valid values ofstartField
andendField
are0(DAY)
,1(HOUR)
,2(MINUTE)
,3(SECOND)
.
(4) StructType
StructType(fields)
Represents values with the structure described by a sequence, list, or array ofStructField
s (fields). Two fields with the same name are not allowed.StructField(name, dataType, nullable)
Represents a field in aStructType
. The name of a field is indicated byname
. The data type of a field is indicated by dataType.nullable
indicates if values of these fields can havenull
values. This is the default.