Data types
Applies to: Databricks SQL
Databricks Runtime
For rules governing how conflicts between data types are resolved, see SQL data type rules.
Supported data types
Azure Databricks supports the following data types:
Data Type | Description |
---|---|
BIGINT | Represents 8-byte signed integer numbers. |
BINARY | Represents byte sequence values. |
BOOLEAN | Represents Boolean values. |
DATE | Represents values comprising values of fields year, month and day, without a time-zone. |
DECIMAL(p,s) | Represents numbers with maximum precision p and fixed scale s . |
DOUBLE | Represents 8-byte double-precision floating point numbers. |
FLOAT | Represents 4-byte single-precision floating point numbers. |
INT | Represents 4-byte signed integer numbers. |
INTERVAL intervalQualifier | Represents intervals of time either on a scale of seconds or months. |
VOID | Represents the untyped NULL. |
SMALLINT | Represents 2-byte signed integer numbers. |
STRING | Represents character string values. |
TIMESTAMP | Represents values comprising values of fields year, month, day, hour, minute, and second, with the session local timezone. |
TIMESTAMP_NTZ | Represents values comprising values of fields year, month, day, hour, minute, and second. All operations are performed without taking any time zone into account. |
TINYINT | Represents 1-byte signed integer numbers. |
ARRAY <elementType> |
Represents values comprising a sequence of elements with the type of elementType . |
MAP < keyType,valueType > | Represents values comprising a set of key-value pairs. |
STRUCT < [fieldName : fieldType [NOT NULL][COMMENT str][, …]] > | Represents values with the structure described by a sequence of fields. |
Important
Delta Lake does not support VOID
and INTERVAL
types.
Data type classification
Data types are grouped into the following classes:
- Integral numeric types represent whole numbers:
- Exact numeric types represent base-10 numbers:
- Binary floating point types use exponents and a binary representation to cover a large range of numbers:
- Numeric types represents all numeric data types:
- Date-time types represent date and time components:
- Simple types are types defined by holding singleton values:
- Complex types are composed of multiple components of complex or simple types:
Language mappings
Applies to: Databricks Runtime
Scala
Spark SQL data types are defined in the package org.apache.spark.sql.types
. You access them by importing the package:
import org.apache.spark.sql.types._
SQL type | Data type | Value type | API to access or create data type |
---|---|---|---|
TINYINT | ByteType | Byte | ByteType |
SMALLINT | ShortType | Short | ShortType |
INT | IntegerType | Int | IntegerType |
BIGINT | LongType | Long | LongType |
FLOAT | FloatType | Float | FloatType |
DOUBLE | DoubleType | Double | DoubleType |
DECIMAL(p,s) | DecimalType | java.math.BigDecimal | DecimalType |
STRING | StringType | String | StringType |
BINARY | BinaryType | Array[Byte] | BinaryType |
BOOLEAN | BooleanType | Boolean | BooleanType |
TIMESTAMP | TimestampType | java.sql.Timestamp | TimestampType |
TIMESTAMP_NTZ | TimestampNTZType | java.time.LocalDateTime | TimestampNTZType |
DATE | DateType | java.sql.Date | DateType |
year-month interval | YearMonthIntervalType | java.time.Period | YearMonthIntervalType (3) |
day-time interval | DayTimeIntervalType | java.time.Duration | DayTimeIntervalType (3) |
ARRAY | ArrayType | scala.collection.Seq | ArrayType(elementType [, containsNull]). (2) |
MAP | MapType | scala.collection.Map | MapType(keyType, valueType [, valueContainsNull]). (2) |
STRUCT | StructType | org.apache.spark.sql.Row | StructType(fields). fields is a Seq of StructField. 4. |
StructField | The value type of the data type of this field(For example, Int for a StructField with the data type IntegerType) | StructField(name, dataType [, nullable]). 4 |
Java
Spark SQL data types are defined in the package org.apache.spark.sql.types
. To access or create a data type, use factory methods provided in org.apache.spark.sql.types.DataTypes
.
SQL type | Data Type | Value type | API to access or create data type |
---|---|---|---|
TINYINT | ByteType | byte or Byte | DataTypes.ByteType |
SMALLINT | ShortType | short or Short | DataTypes.ShortType |
INT | IntegerType | int or Integer | DataTypes.IntegerType |
BIGINT | LongType | long or Long | DataTypes.LongType |
FLOAT | FloatType | float or Float | DataTypes.FloatType |
DOUBLE | DoubleType | double or Double | DataTypes.DoubleType |
DECIMAL(p,s) | DecimalType | java.math.BigDecimal | DataTypes.createDecimalType() DataTypes.createDecimalType(precision, scale). |
STRING | StringType | String | DataTypes.StringType |
BINARY | BinaryType | byte[] | DataTypes.BinaryType |
BOOLEAN | BooleanType | boolean or Boolean | DataTypes.BooleanType |
TIMESTAMP | TimestampType | java.sql.Timestamp | DataTypes.TimestampType |
TIMESTAMP_NTZ | TimestampNTZType | java.time.LocalDateTime | DataTypes.TimestampNTZType |
DATE | DateType | java.sql.Date | DataTypes.DateType |
year-month interval | YearMonthIntervalType | java.time.Period | YearMonthIntervalType (3) |
day-time interval | DayTimeIntervalType | java.time.Duration | DayTimeIntervalType (3) |
ARRAY | ArrayType | ava.util.List | DataTypes.createArrayType(elementType [, containsNull]).(2) |
MAP | MapType | java.util.Map | DataTypes.createMapType(keyType, valueType [, valueContainsNull]).(2) |
STRUCT | StructType | org.apache.spark.sql.Row | DataTypes.createStructType(fields). fields is a List or array of StructField. 4 |
StructField | The value type of the data type of this field (For example, int for a StructField with the data type IntegerType) | DataTypes.createStructField(name, dataType, nullable) 4 |
Python
Spark SQL data types are defined in the package pyspark.sql.types
. You access them by importing the package:
from pyspark.sql.types import *
SQL type | Data type | Value type | API to access or create data type |
---|---|---|---|
TINYINT | ByteType | int or long. (1) | ByteType() |
SMALLINT | ShortType | int or long. (1) | ShortType() |
INT | IntegerType | int or long | IntegerType() |
BIGINT | LongType | long (1) | LongType() |
FLOAT | FloatType | float (1) | FloatType() |
DOUBLE | DoubleType | float | DoubleType() |
DECIMAL(p,s) | DecimalType | decimal.Decimal | DecimalType() |
STRING | StringType | string | StringType() |
BINARY | BinaryType | bytearray | BinaryType() |
BOOLEAN | BooleanType | bool | BooleanType() |
TIMESTAMP | TimestampType | datetime.datetime | TimestampType() |
TIMESTAMP_NTZ | TimestampNTZType | datetime.datetime | TimestampNTZType() |
DATE | DateType | datetime.date | DateType() |
year-month interval | YearMonthIntervalType | Not supported | Not supported |
day-time interval | DayTimeIntervalType | datetime.timedelta | DayTimeIntervalType (3) |
ARRAY | ArrayType | list, tuple, or array | ArrayType(elementType, [containsNull]).(2) |
MAP | MapType | dict | MapType(keyType, valueType, [valueContainsNull]).(2) |
STRUCT | StructType | list or tuple | StructType(fields). field is a Seq of StructField. (4) |
StructField | The value type of the data type of this field (For example, Int for a StructField with the data type IntegerType) | StructField(name, dataType, [nullable]).(4) |
R
SQL type | Data type | Value type | API to access or create data type |
---|---|---|---|
TINYINT | ByteType | integer (1) | ‘byte’ |
SMALLINT | ShortType | integer (1) | ‘short’ |
INT | IntegerType | integer | ‘integer’ |
BIGINT | LongType | integer (1) | ‘long’ |
FLOAT | FloatType | numeric (1) | ‘float’ |
DOUBLE | DoubleType | numeric | ‘double’ |
DECIMAL(p,s) | DecimalType | Not supported | Not supported |
STRING | StringType | character | ‘string’ |
BINARY | BinaryType | raw | ‘binary’ |
BOOLEAN | BooleanType | logical | ‘bool’ |
TIMESTAMP | TimestampType | POSIXct | ‘timestamp’ |
TIMESTAMP_NTZ | TimestampNTZType | datetime.datetime | TimestampNTZType() |
DATE | DateType | Date | ‘date’ |
year-month interval | YearMonthIntervalType | Not supported | Not supported |
day-time interval | DayTimeIntervalType | Not supported | Not supported |
ARRAY | ArrayType | vector or list | list(type=’array’, elementType=elementType, containsNull=[containsNull]).(2) |
MAP | MapType | environment | list(type=’map’, keyType=keyType, valueType=valueType, valueContainsNull=[valueContainsNull]).(2) |
STRUCT | StructType | named list | list(type=’struct’, fields=fields). fields is a Seq of StructField. (4) |
StructField | The value type of the data type of this field (For example, integer for a StructField with the data type IntegerType) | list(name=name, type=dataType, nullable=[nullable]).(4) |
(1) Numbers are converted to the domain at runtime. Make sure that numbers are within range.
(2) The optional value defaults to TRUE
.
(3) Interval types
YearMonthIntervalType([startField,] endField)
: Represents a year-month interval which is made up of a contiguous subset of the following fields:startField
is the leftmost field, andendField
is the rightmost field of the type. Valid values ofstartField
andendField
are0(MONTH)
and1(YEAR)
.DayTimeIntervalType([startField,] endField)
: Represents a day-time interval which is made up of a contiguous subset of the following fields:startField
is the leftmost field, andendField
is the rightmost field of the type. Valid values ofstartField
andendField
are0(DAY)
,1(HOUR)
,2(MINUTE)
,3(SECOND)
.
(4) StructType
StructType(fields)
Represents values with the structure described by a sequence, list, or array ofStructField
s (fields). Two fields with the same name are not allowed.StructField(name, dataType, nullable)
Represents a field in aStructType
. The name of a field is indicated byname
. The data type of a field is indicated by dataType.nullable
indicates if values of these fields can havenull
values. This is the default.
Related articles
Σχόλια
https://aka.ms/ContentUserFeedback.
Σύντομα διαθέσιμα: Καθ' όλη τη διάρκεια του 2024 θα καταργήσουμε σταδιακά τα ζητήματα GitHub ως μηχανισμό ανάδρασης για το περιεχόμενο και θα το αντικαταστήσουμε με ένα νέο σύστημα ανάδρασης. Για περισσότερες πληροφορίες, ανατρέξτε στο θέμα:Υποβολή και προβολή σχολίων για