Support for this Databricks Runtime version has ended. For the end-of-support date, see End-of-support history. For all supported Databricks Runtime versions, see Databricks Runtime release notes versions and compatibility.

The following release notes provide information about Databricks Runtime 12.0, powered by Apache Spark 3.3.1.

Databricks released this version in December 2022.

New features and improvements

Predictive I/O is GA

Predictive I/O is now generally available. For more information, see What is predictive I/O?.

Photon writer supports zstd compression

Photon native writer now supports the zstd compression codec when zstd is enabled by setting spark.sql.parquet.compression.codec to zstd.

Support for stage-level task resource profile for standalone clusters

You can now use stage-level scheduling on standalone clusters when dynamic allocation is disabled. To use this feature, specify the task resources with ResourceProfileBuilder for each stage.

SQL support for selective overwrite with REPLACE WHERE

You can now selectively overwrite data matching an arbitrary expression in a Delta table using the following pattern.

INSERT INTO table_name REPLACE WHERE predicate append_relation

See Arbitrary selective overwrite with replaceWhere.

Watermarks are now supported in SQL

You can now specify watermarks using the Delta Live Tables SQL interface and in SQL queries against streaming DataFrames. See WATERMARK clause.

PySpark memory profiling

Memory profiling is now enabled for PySpark user-defined functions. This provides information on memory increment, memory usage, and number of occurrences for each line of code in a user-defined function.

Dynamic pruning for DELETE and UPDATE

When using Photon-enabled compute, DELETE and UPDATE now use dynamic file and partition pruning where it improves performance. For example, dynamic pruning is enabled when a smaller source table is used to update or delete rows in a larger table.

Row-level delete metrics for partitioned data manipulation language operations

With partitioned predicates, users can now audit how many rows are deleted when running data manipulation language (DML) operations such as DELETE, TRUNCATE, and replaceWhere.

Bug fixes

Fixed an issue with JSON parsing in Auto Loader where all columns were left as strings. Previously, cloudFiles.inferColumnTypes was not set or set to false and the JSON contained nested objects.

Library upgrades

Apache Spark

Databricks Runtime 12.0 includes Apache Spark 3.3.1. This release includes all Spark fixes and improvements included in Databricks Runtime 11.3 LTS, as well as the following additional bug fixes and improvements made to Spark:

  • [SPARK-40844] [SC-116996][12.x][12.0][12.0.0] Revert “[SC-113542][SS] Flip the default value of Kafka offset fetching config”
  • [SPARK-40646] [SC-116061] Revert “[SC-113379][SQL] Fix returning partial results in JSON data source and JSON functions”
  • [SPARK-41195] [SC-116305][SQL] Support PIVOT/UNPIVOT with join children
  • [SPARK-41178] [SC-116139][SQL] Fix parser rule precedence between JOIN and comma
  • [SPARK-41072] [SC-116140][SC-116044][SC-115852][SQL][SS] Add the error class STREAM_FAILED to StreamingQueryException
  • [SPARK-40921] Revert “[SC-115140][SC-114585][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO
  • [SPARK-37980] [SC-115758] [SQL] Extend METADATA column to support row indexes for Parquet files
  • [SPARK-41055] [SC-115582][SQL] Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE
  • [SPARK-41101] [SC-115849][PYTHON][PROTOBUF] Message classname support for PYSPARK-PROTOBUF
  • [SPARK-40956] [SC-115867] SQL Equivalent for Dataframe overwrite command
  • [SPARK-41095] [SC-115659][SQL] Convert unresolved operators to internal errors
  • [SPARK-41144] [SC-115866][SQL] Unresolved hint should not cause query failure
  • [SPARK-40998] [SC-114754][SQL] Rename the error class _LEGACY_ERROR_TEMP_0040 to INVALID_IDENTIFIER
  • [SPARK-41015] [SC-115130][SQL][PROTOBUF] UnitTest null check for data generator
  • [SPARK-40769] [SC-114923][CORE][SQL] Migrate type check failures of aggregate expressions onto error classes
  • [SPARK-41043] [SC-115183][SQL] Rename the error class _LEGACY_ERROR_TEMP_2429 to NUM_COLUMNS_MISMATCH
  • [SPARK-40777] [SC-114855][SQL][PROTOBUF] Protobuf import support and move error-classes.
  • [SPARK-41019] [SC-114987][SQL] Provide a query context to failAnalysis()
  • [SPARK-40476] [SC-115868][SC-111163][ML][SQL] Reduce the shuffle size of ALS
  • [SPARK-41134] [SC-115851][SQL] Improve error message of internal errors
  • [SPARK-37980] Revert ” [SQL] Extend METADATA column to support row indexes for Parquet files
  • [SPARK-41109] [SC-115788][SQL] Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN
  • [SPARK-41029] [SC-115828][SQL] Optimize constructor use of GenericArrayData for Scala 2.13
  • [SPARK-40697] [SC-113648][SQL] Add read-side char padding to cover external data files
  • [SPARK-40978] [SC-114574][SQL] Migrate failAnalysis() w/o a context onto error classes
  • [SPARK-40663] [SC-114586][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2251-2275
  • [SPARK-38959] [SC-115672][SC-112707][SQL] DS V2: Support runtime group filtering in row-level commands
  • [SPARK-40372] [SC-115668][SQL] Migrate failures of array type checks onto error classes
  • [SPARK-40965] [SC-114443][SQL] Rename the error class _LEGACY_ERROR_TEMP_1208 to FIELD_NOT_FOUND
  • [SPARK-40748] [SC-114696][SQL] Migrate type check failures of conditions onto error classes
  • [SPARK-40371] [SC-114557][SQL] Migrate type check failures of NthValue and NTile onto error classes
  • [SPARK-41092] [SC-115547][SQL] Do not use identifier to match interval units
  • [SPARK-41009] [SC-115360][SQL] Rename the error class _LEGACY_ERROR_TEMP_1070 to LOCATION_ALREADY_EXISTS
  • [SPARK-37980] [SQL] Extend METADATA column to support row indexes for Parquet files
  • [SPARK-40967] [SC-114448][SQL] Migrate failAnalysis() onto error classes
  • [SPARK-34265] [SC-115486][SC-113788][PYTHON][SQL] Instrument Python UDFs using SQL metrics
  • [SPARK-41012] [SC-114873][SQL] Rename _LEGACY_ERROR_TEMP_1022 to ORDER_BY_POS_OUT_OF_RANGE
  • [SPARK-40752] [SC-114442][SQL] Migrate type check failures of misc expressions onto error classes
  • [SPARK-37945] [SC-115534][SQL][CORE] Use error classes in the execution errors of arithmetic ops
  • [SPARK-40374] [SC-114705][SQL] Migrate type check failures of type creators onto error classes
  • [SPARK-41056] [SC-115354][R] Fix new R_LIBS_SITE behavior introduced in R 4.2
  • [SPARK-41041] [SC-115357][SQL] Integrate _LEGACY_ERROR_TEMP_1279 into TABLE_OR_VIEW_ALREADY_EXISTS
  • [SPARK-40663] [SC-114852][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2201-2225
  • [SPARK-41020] [SC-114984][SQL] Rename the error class _LEGACY_ERROR_TEMP_1019 to STAR_GROUP_BY_POS
  • [SPARK-41035] [SC-115233][SQL] Don’t patch foldable children of aggregate functions in RewriteDistinctAggregates
  • [SPARK-39778] [SC-114353][SQL] Improve error classes and messages
  • [SPARK-40798] [SC-113782][SQL] Alter partition should verify value follow storeAssignmentPolicy
  • [SPARK-40810] [SC-113533][SQL] Use SparkIllegalArgumentException instead of IllegalArgumentException in CreateDatabaseCommand & AlterDatabaseSetLocationCommand
  • [SPARK-40921] [SC-115140][SC-114585][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO
  • [SPARK-40360] [SC-114965] ALREADY_EXISTS and NOT_FOUND exceptions
  • [SPARK-41007] [SC-115085][SQL] Add missing serializer for java.math.BigInteger
  • [SPARK-40751] [SC-114254][SQL] Migrate type check failures of high order functions onto error classes
  • [SPARK-40248] [SC-114704][SQL] Use larger number of bits to build Bloom filter
  • [SPARK-41040] [SC-115145][SS] Fix self-union streaming query failure when using readStream.table
  • [SPARK-32380] [SC-114966][SQL] Fixing access of HBase table via Hive from Spark
  • [SPARK-40749] [SC-114860][SQL] Migrate type check failures of generators onto error classes
  • [SPARK-40925] [SC-114445][SQL][SS][WARMFIX][12.x] Fix stateful operator late re…
  • [SPARK-40654] [SC-112783] Remove temporary log lines
  • [SPARK-40742] [SC-114913][SC-112646][CORE][SQL] Fix Java compilation warnings related to generic type
  • [SPARK-40657] [SC-113642][Cherry-pick] Add support for Java classes in Protob…
  • [SPARK-40898] [SC-113932][SQL] Quote function names in datatype mismatch errors
  • [SPARK-40760] [SC-113750][SQL] Migrate type check failures of interval expressions onto error classes
  • [SPARK-36114] [SC-113771][SQL] Support subqueries with correlated non-equality predicates
  • [SPARK-40856] [SC-113738][SQL] Update the error template of WRONG_NUM_PARAMS
  • [SPARK-40759] [SC-114251][SQL] Migrate type check failures of time window onto error classes
  • [SPARK-40750] [SC-113812][SQL] Migrate type check failures of math expressions onto error classes
  • [SPARK-40756] [SC-113749][SQL] Migrate type check failures of string expressions onto error classes
  • [SPARK-40768] [SC-113640][SQL] Migrate type check failures of bloom_filter_agg() onto error classes
  • [SPARK-40369] [SC-113402][CORE][SQL] Migrate the type check failures of calls via reflection onto error classes
  • [SPARK-39445] [SQL] Remove the window if windowExpressions is empty in column pruning
  • [SPARK-40761] [SC-113245][SQL] Migrate type check failures of percentile expressions onto error classes
  • [SPARK-40361] [SC-112789][SQL] Migrate arithmetic type check failures onto error classes
  • [SPARK-40714] [SC-112576][SQL] Remove PartitionAlreadyExistsException
  • [SPARK-40702] [SC-112437][SQL] Fix partition specs in PartitionsAlreadyExistException
  • [SPARK-40358] [SC-112631][SQL] Migrate collection type check failures onto error classes
  • [SPARK-40910] [SC-114259][SQL] Replace UnsupportedOperationException with SparkUnsupportedOperationException
  • [SPARK-39876] [SC-112429][SQL] Add UNPIVOT to SQL syntax
  • [SPARK-39783] [SC-113389][SQL] Quote qualifiedName to fix backticks for column candidates in error messages
  • [SPARK-40663] [SC-112520][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2026-2282
  • [SPARK-40975] [SC-114555][SQL] Rename the error class _LEGACY_ERROR_TEMP_0021 to UNSUPPORTED_TYPED_LITERAL
  • [SPARK-37935] [SC-114472][SQL] Eliminate separate error sub-classes fields
  • [SPARK-40944] [SC-114530][SQL] Relax ordering constraint for CREATE TABLE column options
  • [SPARK-40815] [SC-114528][SQL] Add DelegateSymlinkTextInputFormat to workaround SymlinkTextInputSplit bug
  • [SPARK-40933] [SC-114441][SQL] Reimplement df.stat.{cov, corr} with built-in sql functions
  • [SPARK-40932] [SC-114439][CORE] Fix issue messages for allGather are overridden
  • [SPARK-39312] [SQL] Use parquet native In predicate for in filter push down
  • [SPARK-40862] [SC-114352][SQL] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery
  • [SPARK-40963] [SC-114446][SQL] Set nullable correctly in project created by ExtractGenerator
  • [SPARK-40663] [SC-112295][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2000-2024
  • [SPARK-40892] [SC-114347][SQL][SS] Loosen the requirement of window_time rule - allow multiple window_time calls
  • [SPARK-40540] [SC-111639][SQL] Migrate compilation errors onto error classes: _LEGACY_ERROR_TEMP_1100-1347
  • [SPARK-40924] [SC-114258][SQL] Fix for Unhex when input has odd number of symbols
  • [SPARK-40821] [SC-113754][SQL][CORE][PYTHON][SS] Introduce window_time function to extract event time from the window column
  • [SPARK-40800] [SC-113772][SQL] Always inline expressions in OptimizeOneRowRelationSubquery
  • [SPARK-40900] [SC-114006][SQL] Reimplement frequentItems with dataframe operations
  • [SPARK-40735] [SC-114022][SC-112657] Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable
  • [SPARK-40773] [SC-113091][SQL] Refactor checkCorrelationsInSubquery
  • [SPARK-40615] [SC-113400][SQL] Check unsupported data types when decorrelating subqueries
  • [SPARK-40660] [SC-112293][CORE][SQL] Switch to XORShiftRandom to distribute elements
  • [SPARK-40540] [SC-113926][SC-111331][SQL] Migrate compilation errors onto error classes
  • [SPARK-40551] [SC-113135][SQL] DataSource V2: Add APIs for delta-based row-level operations
  • [SPARK-39391] [SC-110676][CORE] Reuse Partitioner classes
  • [SPARK-40368] [SC-113396][SQL] Migrate Bloom Filter type check failures onto error classes
  • [SPARK-39146] [SC-111814][CORE][SQL] Introduce local singleton for ObjectMapper that may be reused
  • [SPARK-40357] [SC-113786][SC-111352][SQL] Migrate window type check failures onto error classes
  • [SPARK-40874] [SC-113756][PYTHON] Fix broadcasts in Python UDFs when encryption enabled
  • [SPARK-40359] [SC-113770][SC-111146][SQL] Migrate type check fails in CSV/JSON expressions to error classes
  • [SPARK-40880] [SC-113764][SQL] Reimplement summary with dataframe operations
  • [SPARK-40877] [SC-113763][SQL] Reimplement crosstab with dataframe operations
  • [SPARK-40382] [SC-113098][SQL] Group distinct aggregate expressions by semantically equivalent children in RewriteDistinctAggregates
  • [SPARK-40826] [SC-113532][SS] Add additional checkpoint rename file check
  • [SPARK-40829] [SC-113426][SQL] STORED AS serde in CREATE TABLE LIKE view does not work
  • [SPARK-40844] [SC-113542][SS] Flip the default value of Kafka offset fetching config
  • [SPARK-40488] [SC-113568][SC-111160] Do not wrap exceptions thrown when datasource write fails
  • [SPARK-40560] [SC-113631][SC-111335][SQL] Rename message to messageTemplate in the STANDARD format of errors
  • [SPARK-40618] [SC-113096][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries using reference tracking
  • [SPARK-40530] [SC-113528][SC-111332][SQL] Add error-related developer APIs
  • [SPARK-40806] [SC-113250][SQL] Typo fix: CREATE TABLE -> REPLACE TABLE
  • [SPARK-40646] [SC-113379][SQL] Fix returning partial results in JSON data source and JSON functions
  • [SPARK-40654] [Cherry-pick][SC-112783][SQL] Protobuf support for Spark - from…
  • [SPARK-40753] [SC-113539][SQL] Fix bug in test case for catalog directory operation
  • [SPARK-40765] [SC-113056][SQL] Optimize redundant fs operation in CommandUtils#calculateSingleLocationSize#getPathSize method
  • [SPARK-40114] [ES-479282][R][11.X] Arrow 9.0.0 support with SparkR
  • [SPARK-40479] [SC-110935][SQL] Migrate unexpected input type error to an error class
  • [SPARK-40473] [SC-113388][SC-111141][SQL] Migrate parsing errors onto error classes
  • [SPARK-39853] [SC-111867][CORE] Support stage level task resource profile for standalone cluster when dynamic allocation disabled(back port from PR-47157)
  • [SPARK-40407] [SC-111161][SQL] Fix the potential data skew caused by df.repartition
  • [SPARK-40425] [SC-110840][SQL] DROP TABLE does not need to do table lookup
  • [SPARK-40703] [SC-113242][SQL] Introduce shuffle on SinglePartition to improve parallelism
  • [SPARK-40640] [SC-113138][CORE] SparkHadoopUtil to set origin of hadoop/hive config options
  • [SPARK-39062] [11.x][CORE] Add stage level resource scheduling support for standalone cluster
  • [SPARK-33861] [SC-109877]Revert “[SQL] Simplify conditional in predicate”
  • [SPARK-40667] [SC-113136][SQL] Refactor File Data Source Options
  • [SPARK-40370] [SC-110677][SQL] Migrate type check fails to error classes in CAST
  • [SPARK-40611] [SC-113105][SQL] Improve the performance of setInterval & getInterval for UnsafeRow
  • [SPARK-40733] [SC-113126][SQL] Make the contents of SERDEPROPERTIES in the result of ShowCreateTableAsSerdeCommand have a fixed order
  • [SPARK-40585] [SC-112180][SQL] Support double quoted identifiers
  • [SPARK-40772] [SC-112992][SQL] Improve spark.sql.adaptive.skewJoin.skewedPartitionFactor to support Double values
  • [SPARK-40565] [SC-112417][SQL] Don’t push non-deterministic filters to V2 file sources
  • [SPARK-8731] [SC-112784] Beeline doesn’t work with -e option when started in background
  • [SPARK-35242] [SC-111398][SQL] Support changing session catalog’s default database
  • [SPARK-40426] [SC-110672][SQL] Return a map from SparkThrowable.getMessageParameters
  • [SPARK-40494] [SC-112655][SC-111025][CORE][SQL][ML][MLLIB] Optimize the performance of keys.zipWithIndex.toMap code pattern
  • [SPARK-40521] [SC-112430][SQL] Return only exists partitions in PartitionsAlreadyExistException from Hive’s create partition
  • [SPARK-40705] [SC-112567][SQL] Handle case of using mutable array when converting Row to JSON for Scala 2.13
  • [SPARK-39895] [SC-112590][SQL][PYTHON] Support multiple column drop
  • [SPARK-40607] [SC-112296][CORE][SQL][MLLIB][SS] Remove redundant string interpolator operations
  • [SPARK-40420] [SC-112604][SC-110569][SQL] Sort error message parameters by names in the JSON formats
  • [SPARK-40403] [SC-110564][SQL] Calculate unsafe array size using longs to avoid negative size in error message
  • [SPARK-40482] [SC-110831][SQL] Revert SPARK-24544 Print actual failure cause when look up function failed
  • [SPARK-40400] [SC-112555][SC-110397][SQL] Pass error message parameters to exceptions as maps
  • [SPARK-40628] [SC-112414][SQL] Do not push complex left semi/anti join condition through project
  • [SPARK-40562] [SC-111455][SQL] Add spark.sql.legacy.groupingIdWithAppendedUserGroupBy
  • [SPARK-40501] [SC-111333][SQL] Add PushProjectionThroughLimit for Optimizer
  • [SPARK-39200] [SC-111244][CORE] Make Fallback Storage readFully on content
  • [SPARK-38717] [SC-111241][SQL] Handle Hive’s bucket spec case preserving behavior
  • [SPARK-40385] [SC-111123][SQL] Fix interpreted path for companion object constructor
  • [SPARK-40216] [SC-112413][SQL] Extract common ParquetUtils.prepareWrite method to deduplicate code in ParquetFileFormat and ParquetWrite
  • [SPARK-40636] [SC-112160][CORE] Fix wrong remained shuffles log in BlockManagerDecommissioner
  • [SPARK-40617] [SC-112009] Fix race condition at the handling of ExecutorMetricsPoller’s stageTCMP entries
  • [SPARK-40618] [SC-112046][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries
  • [SPARK-40612] [SC-111933][CORE] Fixing the principal used for delegation token renewal on non-YARN resource managers
  • [SPARK-40595] [SC-111662][SQL] Improve error message for unused CTE relations
  • [SPARK-40314] [SC-111879][SQL][PYTHON] Add scala and python bindings for inline and inline_outer
  • [SPARK-40416] [SC-110945][SQL] Move subquery expression CheckAnalysis error messages to use the new error framework
  • [SPARK-40318] [SC-111865][SC-109986][SQL] try_avg() should throw the exceptions from its child
  • [SPARK-40509] [SC-111644][SS][PYTHON] Add example for applyInPandasWithState
  • [SPARK-40310] [SC-111329][SC-109842][SQL] try_sum() should throw the exceptions from its child
  • [SPARK-40016] [SQL] Remove unnecessary TryEval in the implementation of try_sum()
  • [SPARK-40527] [SC-111224][SQL] Keep struct field names or map keys in CreateStruct
  • [SPARK-38098] [SC-111178][PYTHON] Add support for ArrayType of nested StructType to arrow-based conversion
  • [SPARK-40324] [SC-111359][SC-110293][SQL] Provide query context in AnalysisException
  • [SPARK-40492] [SC-111324][SS] Do maintenance before streaming StateStore unload
  • [SPARK-40487] [SC-111124][SQL] Make defaultJoin in BroadcastNestedLoopJoinExec running in parallel
  • [SPARK-40474] [SC-111273][SQL] Correct CSV schema inference and data parsing behavior on columns with mixed dates and timestamps
  • [SPARK-40508] [SC-111066][SQL] Treat unknown partitioning as UnknownPartitioning
  • [SPARK-40496] [SC-111234][SC-111013][SQL] Fix configs to control “enableDateTimeParsingFallback”
  • [SPARK-40435] [SC-111214][SS][PYTHON] Add test suites for applyInPandasWithState in PySpark
  • [SPARK-40434] [SC-111125][SS][PYTHON] Implement applyInPandasWithState in PySpark

Maintenance updates

See Databricks Runtime 12.0 maintenance updates.

System environment

  • Operating System: Ubuntu 20.04.5 LTS
  • Java: Zulu
  • Scala: 2.12.14
  • Python: 3.9.5
  • R: 4.2.2
  • Delta Lake: 2.2.0

Installed Python libraries

Installed R libraries

R libraries are installed from the Microsoft CRAN snapshot on 2022-11-11.

Installed Java and Scala libraries (Scala 2.12 cluster version)

