Databricks Runtime 14.0 (EoS)

Article
10/18/2024

Note

Support for this Databricks Runtime version has ended. For the end-of-support date, see End-of-support history. For all supported Databricks Runtime versions, see Databricks Runtime release notes versions and compatibility.

The following release notes provide information about Databricks Runtime 14.0, powered by Apache Spark 3.5.0.

Databricks released this version in September 2023.

New features and improvements

Row tracking is GA
Predictive I/O for updates is GA
Deletion vectors are GA
Spark 3.5.0 is GA
Public preview for user-defined table functions for Python
Public preview for row-level concurrency
Default current working directory has changed
Known issue with sparklyr
Introducing Spark Connect in shared cluster architecture
List available Spark versions API update

Row tracking is GA

Row tracking for Delta Lake is now generally available. See Use row tracking for Delta tables.

Predictive I/O for updates is GA

Predictive I/O for updates is now generally available. See What is predictive I/O?.

Deletion vectors are GA

Deletion vectors are now generally available. See What are deletion vectors?.

Spark 3.5.0 is GA

Apache Spark 3.5.0 is now generally available. See Spark Release 3.5.0.

Public preview for user-defined table functions for Python

User-defined table functions (UDTFs) allow you to register functions that return tables instead of scalar values. See Python user-defined table functions (UDTFs).

Public preview for row-level concurrency

Row-level concurrency reduces conflicts between concurrent write operations by detecting changes at the row-level and automatically resolving competing changes in concurrent writes that update or delete different rows in the same data file. See Write conflicts with row-level concurrency.

Default current working directory has changed

The default current working directory (CWD) for code executed locally is now the directory containing the notebook or script being run. This includes code such as %sh and Python or R code not using Spark. See What is the default current working directory?.

Known issue with sparklyr

The installed version of the sparklyr package (version 1.8.1) is not compatible with Databricks Runtime 14.0. To use sparklyr, install version 1.8.3 or above.

Introducing Spark Connect in shared cluster architecture

With Databricks Runtime 14.0 and above, shared clusters now use Spark Connect with the Spark Driver from the Python REPL by default. Internal Spark APIs are no longer accessible from user code.

Spark Connect now interacts with the Spark Driver from the REPL, instead of the legacy REPL integration.

List available Spark versions API update

Enable Photon by setting runtime_engine = PHOTON, and enable aarch64 by choosing a graviton instance type. Azure Databricks sets the correct Databricks Runtime version. Previously, the Spark version API would return implementation-specific runtimes for each version. See GET /api/2.0/clusters/spark-versions in the REST API Reference.

Breaking changes

In Databricks Runtime 14.0 and above, clusters with shared access mode use Spark Connect for client-server communication. This includes the following changes.

For more on shared access mode limitations, see Compute access mode limitations for Unity Catalog.

Python on clusters with shared access mode

sqlContext is not available. Azure Databricks recommends using the spark variable for the SparkSession instance.
Spark Context (sc) is no longer available in Notebooks, or when using Databricks Connect on a cluster with shared access mode. The following sc functions are no longer available:
- emptyRDD, range, init_batched_serializer, parallelize, pickleFile, textFile, wholeTextFiles, binaryFiles, binaryRecords, sequenceFile, newAPIHadoopFile, newAPIHadoopRDD, hadoopFile, hadoopRDD, union, runJob, setSystemProperty, uiWebUrl, stop, setJobGroup, setLocalProperty, getConf
The Dataset Info feature is no longer supported.
There is no longer a dependency on the JVM when querying Apache Spark and as a consequence, internal APIs related to the JVM, such as _jsc, _jconf, _jvm, _jsparkSession, _jreader, _jc, _jseq, _jdf, _jmap, and _jcols are no longer supported.
When accessing configuration values using spark.conf only dynamic runtime configuration values are accessible.
Delta Live Tables analysis commands are not supported on shared clusters yet.

Delta on clusters with shared access mode

In Python, there is no longer a dependency on JVM when querying Apache Spark. Internal APIs related to JVM, such as DeltaTable._jdt, DeltaTableBuilder._jbuilder, DeltaMergeBuilder._jbuilder, and DeltaOptimizeBuilder._jbuilder are no longer supported.

SQL on clusters with shared access mode

DBCACHE and DBUNCACHE commands are no longer supported.
Rare use cases like cache table db as show databases are no longer supported.

Library upgrades

Upgraded Python libraries:
- asttokens from 2.2.1 to 2.0.5
- attrs from 21.4.0 to 22.1.0
- botocore from 1.27.28 to 1.27.96
- certifi from 2022.9.14 to 2022.12.7
- cryptography from 37.0.1 to 39.0.1
- debugpy from 1.6.0 to 1.6.7
- docstring-to-markdown from 0.12 to 0.11
- executing from 1.2.0 to 0.8.3
- facets-overview from 1.0.3 to 1.1.1
- googleapis-common-protos from 1.56.4 to 1.60.0
- grpcio from 1.48.1 to 1.48.2
- idna from 3.3 to 3.4
- ipykernel from 6.17.1 to 6.25.0
- ipython from 8.10.0 to 8.14.0
- Jinja2 from 2.11.3 to 3.1.2
- jsonschema from 4.16.0 to 4.17.3
- jupyter_core from 4.11.2 to 5.2.0
- kiwisolver from 1.4.2 to 1.4.4
- MarkupSafe from 2.0.1 to 2.1.1
- matplotlib from 3.5.2 to 3.7.0
- nbconvert from 6.4.4 to 6.5.4
- nbformat from 5.5.0 to 5.7.0
- nest-asyncio from 1.5.5 to 1.5.6
- notebook from 6.4.12 to 6.5.2
- numpy from 1.21.5 to 1.23.5
- packaging from 21.3 to 22.0
- pandas from 1.4.4 to 1.5.3
- pathspec from 0.9.0 to 0.10.3
- patsy from 0.5.2 to 0.5.3
- Pillow from 9.2.0 to 9.4.0
- pip from 22.2.2 to 22.3.1
- protobuf from 3.19.4 to 4.24.0
- pytoolconfig from 1.2.2 to 1.2.5
- pytz from 2022.1 to 2022.7
- s3transfer from 0.6.0 to 0.6.1
- seaborn from 0.11.2 to 0.12.2
- setuptools from 63.4.1 to 65.6.3
- soupsieve from 2.3.1 to 2.3.2.post1
- stack-data from 0.6.2 to 0.2.0
- statsmodels from 0.13.2 to 0.13.5
- terminado from 0.13.1 to 0.17.1
- traitlets from 5.1.1 to 5.7.1
- typing_extensions from 4.3.0 to 4.4.0
- urllib3 from 1.26.11 to 1.26.14
- virtualenv from 20.16.3 to 20.16.7
- wheel from 0.37.1 to 0.38.4
Upgraded R libraries:
- arrow from 10.0.1 to 12.0.1
- base from 4.2.2 to 4.3.1
- blob from 1.2.3 to 1.2.4
- broom from 1.0.3 to 1.0.5
- bslib from 0.4.2 to 0.5.0
- cachem from 1.0.6 to 1.0.8
- caret from 6.0-93 to 6.0-94
- chron from 2.3-59 to 2.3-61
- class from 7.3-21 to 7.3-22
- cli from 3.6.0 to 3.6.1
- clock from 0.6.1 to 0.7.0
- commonmark from 1.8.1 to 1.9.0
- compiler from 4.2.2 to 4.3.1
- cpp11 from 0.4.3 to 0.4.4
- curl from 5.0.0 to 5.0.1
- data.table from 1.14.6 to 1.14.8
- datasets from 4.2.2 to 4.3.1
- dbplyr from 2.3.0 to 2.3.3
- digest from 0.6.31 to 0.6.33
- downlit from 0.4.2 to 0.4.3
- dplyr from 1.1.0 to 1.1.2
- dtplyr from 1.2.2 to 1.3.1
- evaluate from 0.20 to 0.21
- fastmap from 1.1.0 to 1.1.1
- fontawesome from 0.5.0 to 0.5.1
- fs from 1.6.1 to 1.6.2
- future from 1.31.0 to 1.33.0
- future.apply from 1.10.0 to 1.11.0
- gargle from 1.3.0 to 1.5.1
- ggplot2 from 3.4.0 to 3.4.2
- gh from 1.3.1 to 1.4.0
- glmnet from 4.1-6 to 4.1-7
- googledrive from 2.0.0 to 2.1.1
- googlesheets4 from 1.0.1 to 1.1.1
- graphics from 4.2.2 to 4.3.1
- grDevices from 4.2.2 to 4.3.1
- grid from 4.2.2 to 4.3.1
- gtable from 0.3.1 to 0.3.3
- hardhat from 1.2.0 to 1.3.0
- haven from 2.5.1 to 2.5.3
- hms from 1.1.2 to 1.1.3
- htmltools from 0.5.4 to 0.5.5
- htmlwidgets from 1.6.1 to 1.6.2
- httpuv from 1.6.8 to 1.6.11
- httr from 1.4.4 to 1.4.6
- ipred from 0.9-13 to 0.9-14
- jsonlite from 1.8.4 to 1.8.7
- KernSmooth from 2.23-20 to 2.23-21
- knitr from 1.42 to 1.43
- later from 1.3.0 to 1.3.1
- lattice from 0.20-45 to 0.21-8
- lava from 1.7.1 to 1.7.2.1
- lubridate from 1.9.1 to 1.9.2
- markdown from 1.5 to 1.7
- MASS from 7.3-58.2 to 7.3-60
- Matrix from 1.5-1 to 1.5-4.1
- methods from 4.2.2 to 4.3.1
- mgcv from 1.8-41 to 1.8-42
- modelr from 0.1.10 to 0.1.11
- nnet from 7.3-18 to 7.3-19
- openssl from 2.0.5 to 2.0.6
- parallel from 4.2.2 to 4.3.1
- parallelly from 1.34.0 to 1.36.0
- pillar from 1.8.1 to 1.9.0
- pkgbuild from 1.4.0 to 1.4.2
- pkgload from 1.3.2 to 1.3.2.1
- pROC from 1.18.0 to 1.18.4
- processx from 3.8.0 to 3.8.2
- prodlim from 2019.11.13 to 2023.03.31
- profvis from 0.3.7 to 0.3.8
- ps from 1.7.2 to 1.7.5
- Rcpp from 1.0.10 to 1.0.11
- readr from 2.1.3 to 2.1.4
- readxl from 1.4.2 to 1.4.3
- recipes from 1.0.4 to 1.0.6
- rlang from 1.0.6 to 1.1.1
- rmarkdown from 2.20 to 2.23
- Rserve from 1.8-12 to 1.8-11
- RSQLite from 2.2.20 to 2.3.1
- rstudioapi from 0.14 to 0.15.0
- sass from 0.4.5 to 0.4.6
- shiny from 1.7.4 to 1.7.4.1
- sparklyr from 1.7.9 to 1.8.1
- SparkR from 3.4.1 to 3.5.0
- splines from 4.2.2 to 4.3.1
- stats from 4.2.2 to 4.3.1
- stats4 from 4.2.2 to 4.3.1
- survival from 3.5-3 to 3.5-5
- sys from 3.4.1 to 3.4.2
- tcltk from 4.2.2 to 4.3.1
- testthat from 3.1.6 to 3.1.10
- tibble from 3.1.8 to 3.2.1
- tidyverse from 1.3.2 to 2.0.0
- tinytex from 0.44 to 0.45
- tools from 4.2.2 to 4.3.1
- tzdb from 0.3.0 to 0.4.0
- usethis from 2.1.6 to 2.2.2
- utils from 4.2.2 to 4.3.1
- vctrs from 0.5.2 to 0.6.3
- viridisLite from 0.4.1 to 0.4.2
- vroom from 1.6.1 to 1.6.3
- waldo from 0.4.0 to 0.5.1
- xfun from 0.37 to 0.39
- xml2 from 1.3.3 to 1.3.5
- zip from 2.2.2 to 2.3.0
Upgraded Java libraries:
- com.fasterxml.jackson.core.jackson-annotations from 2.14.2 to 2.15.2
- com.fasterxml.jackson.core.jackson-core from 2.14.2 to 2.15.2
- com.fasterxml.jackson.core.jackson-databind from 2.14.2 to 2.15.2
- com.fasterxml.jackson.dataformat.jackson-dataformat-cbor from 2.14.2 to 2.15.2
- com.fasterxml.jackson.datatype.jackson-datatype-joda from 2.14.2 to 2.15.2
- com.fasterxml.jackson.datatype.jackson-datatype-jsr310 from 2.13.4 to 2.15.1
- com.fasterxml.jackson.module.jackson-module-paranamer from 2.14.2 to 2.15.2
- com.fasterxml.jackson.module.jackson-module-scala_2.12 from 2.14.2 to 2.15.2
- com.github.luben.zstd-jni from 1.5.2-5 to 1.5.5-4
- com.google.code.gson.gson from 2.8.9 to 2.10.1
- com.google.crypto.tink.tink from 1.7.0 to 1.9.0
- commons-codec.commons-codec from 1.15 to 1.16.0
- commons-io.commons-io from 2.11.0 to 2.13.0
- io.airlift.aircompressor from 0.21 to 0.24
- io.dropwizard.metrics.metrics-core from 4.2.10 to 4.2.19
- io.dropwizard.metrics.metrics-graphite from 4.2.10 to 4.2.19
- io.dropwizard.metrics.metrics-healthchecks from 4.2.10 to 4.2.19
- io.dropwizard.metrics.metrics-jetty9 from 4.2.10 to 4.2.19
- io.dropwizard.metrics.metrics-jmx from 4.2.10 to 4.2.19
- io.dropwizard.metrics.metrics-json from 4.2.10 to 4.2.19
- io.dropwizard.metrics.metrics-jvm from 4.2.10 to 4.2.19
- io.dropwizard.metrics.metrics-servlets from 4.2.10 to 4.2.19
- io.netty.netty-all from 4.1.87.Final to 4.1.93.Final
- io.netty.netty-buffer from 4.1.87.Final to 4.1.93.Final
- io.netty.netty-codec from 4.1.87.Final to 4.1.93.Final
- io.netty.netty-codec-http from 4.1.87.Final to 4.1.93.Final
- io.netty.netty-codec-http2 from 4.1.87.Final to 4.1.93.Final
- io.netty.netty-codec-socks from 4.1.87.Final to 4.1.93.Final
- io.netty.netty-common from 4.1.87.Final to 4.1.93.Final
- io.netty.netty-handler from 4.1.87.Final to 4.1.93.Final
- io.netty.netty-handler-proxy from 4.1.87.Final to 4.1.93.Final
- io.netty.netty-resolver from 4.1.87.Final to 4.1.93.Final
- io.netty.netty-transport from 4.1.87.Final to 4.1.93.Final
- io.netty.netty-transport-classes-epoll from 4.1.87.Final to 4.1.93.Final
- io.netty.netty-transport-classes-kqueue from 4.1.87.Final to 4.1.93.Final
- io.netty.netty-transport-native-epoll from 4.1.87.Final-linux-x86_64 to 4.1.93.Final-linux-x86_64
- io.netty.netty-transport-native-kqueue from 4.1.87.Final-osx-x86_64 to 4.1.93.Final-osx-x86_64
- io.netty.netty-transport-native-unix-common from 4.1.87.Final to 4.1.93.Final
- org.apache.arrow.arrow-format from 11.0.0 to 12.0.1
- org.apache.arrow.arrow-memory-core from 11.0.0 to 12.0.1
- org.apache.arrow.arrow-memory-netty from 11.0.0 to 12.0.1
- org.apache.arrow.arrow-vector from 11.0.0 to 12.0.1
- org.apache.avro.avro from 1.11.1 to 1.11.2
- org.apache.avro.avro-ipc from 1.11.1 to 1.11.2
- org.apache.avro.avro-mapred from 1.11.1 to 1.11.2
- org.apache.commons.commons-compress from 1.21 to 1.23.0
- org.apache.hadoop.hadoop-client-runtime from 3.3.4 to 3.3.6
- org.apache.logging.log4j.log4j-1.2-api from 2.19.0 to 2.20.0
- org.apache.logging.log4j.log4j-api from 2.19.0 to 2.20.0
- org.apache.logging.log4j.log4j-core from 2.19.0 to 2.20.0
- org.apache.logging.log4j.log4j-slf4j2-impl from 2.19.0 to 2.20.0
- org.apache.orc.orc-core from 1.8.4-shaded-protobuf to 1.9.0-shaded-protobuf
- org.apache.orc.orc-mapreduce from 1.8.4-shaded-protobuf to 1.9.0-shaded-protobuf
- org.apache.orc.orc-shims from 1.8.4 to 1.9.0
- org.apache.xbean.xbean-asm9-shaded from 4.22 to 4.23
- org.checkerframework.checker-qual from 3.19.0 to 3.31.0
- org.glassfish.jersey.containers.jersey-container-servlet from 2.36 to 2.40
- org.glassfish.jersey.containers.jersey-container-servlet-core from 2.36 to 2.40
- org.glassfish.jersey.core.jersey-client from 2.36 to 2.40
- org.glassfish.jersey.core.jersey-common from 2.36 to 2.40
- org.glassfish.jersey.core.jersey-server from 2.36 to 2.40
- org.glassfish.jersey.inject.jersey-hk2 from 2.36 to 2.40
- org.javassist.javassist from 3.25.0-GA to 3.29.2-GA
- org.mariadb.jdbc.mariadb-java-client from 2.7.4 to 2.7.9
- org.postgresql.postgresql from 42.3.8 to 42.6.0
- org.roaringbitmap.RoaringBitmap from 0.9.39 to 0.9.45
- org.roaringbitmap.shims from 0.9.39 to 0.9.45
- org.rocksdb.rocksdbjni from 7.8.3 to 8.3.2
- org.scala-lang.modules.scala-collection-compat_2.12 from 2.4.3 to 2.9.0
- org.slf4j.jcl-over-slf4j from 2.0.6 to 2.0.7
- org.slf4j.jul-to-slf4j from 2.0.6 to 2.0.7
- org.slf4j.slf4j-api from 2.0.6 to 2.0.7
- org.xerial.snappy.snappy-java from 1.1.10.1 to 1.1.10.3
- org.yaml.snakeyaml from 1.33 to 2.0

Apache Spark

Databricks Runtime 14.0. This release includes all Spark fixes and improvements included in Databricks Runtime 13.3 LTS, as well as the following additional bug fixes and improvements made to Spark:

[SPARK-45109] [DBRRM-462][SC-142247][SQL][CONNECT] Fix aes_decrypt and ln functions in Connect
[SPARK-44980] [DBRRM-462][SC-141024][PYTHON][CONNECT] Fix inherited namedtuples to work in createDataFrame
[SPARK-44795] [DBRRM-462][SC-139720][CONNECT] CodeGenerator Cache should be classloader specific
[SPARK-44861] [DBRRM-498][SC-140716][CONNECT] jsonignore SparkListenerConnectOperationStarted.planRequest
[SPARK-44794] [DBRRM-462][SC-139767][CONNECT] Make Streaming Queries work with Connect’s artifact management
[SPARK-44791] [DBRRM-462][SC-139623][CONNECT] Make ArrowDeserializer work with REPL generated classes
[SPARK-44876] [DBRRM-480][SC-140431][PYTHON] Fix Arrow-optimized Python UDF on Spark Connect
[SPARK-44877] [DBRRM-482][SC-140437][CONNECT][PYTHON] Support python protobuf functions for Spark Connect
[SPARK-44882] [DBRRM-463][SC-140430][PYTHON][CONNECT] Remove function uuid/random/chr from PySpark
[SPARK-44740] [DBRRM-462][SC-140320][CONNECT][FOLLOW] Fix metadata values for Artifacts
[SPARK-44822] [DBRRM-464][PYTHON][SQL] Make Python UDTFs by default non-deterministic
[SPARK-44836] [DBRRM-468][SC-140228][PYTHON] Refactor Arrow Python UDTF
[SPARK-44738] [DBRRM-462][SC-139347][PYTHON][CONNECT] Add missing client metadata to calls
[SPARK-44722] [DBRRM-462][SC-139306][CONNECT] ExecutePlanResponseReattachableIterator._call_iter: AttributeError: ‘NoneType’ object has no attribute ‘message’
[SPARK-44625] [DBRRM-396][SC-139535][CONNECT] SparkConnectExecutionManager to track all executions
[SPARK-44663] [SC-139020][DBRRM-420][PYTHON] Disable arrow optimization by default for Python UDTFs
[SPARK-44709] [DBRRM-396][SC-139250][CONNECT] Run ExecuteGrpcResponseSender in reattachable execute in new thread to fix flow control
[SPARK-44656] [DBRRM-396][SC-138924][CONNECT] Make all iterators CloseableIterators
[SPARK-44671] [DBRRM-396][SC-138929][PYTHON][CONNECT] Retry ExecutePlan in case initial request didn’t reach server in Python client
[SPARK-44624] [DBRRM-396][SC-138919][CONNECT] Retry ExecutePlan in case initial request didn’t reach server
[SPARK-44574] [DBRRM-396][SC-138288][SQL][CONNECT] Errors that moved into sq/api should also use AnalysisException
[SPARK-44613] [DBRRM-396][SC-138473][CONNECT] Add Encoders object
[SPARK-44626] [DBRRM-396][SC-138828][SS][CONNECT] Followup on streaming query termination when client session is timed out for Spark Connect
[SPARK-44642] [DBRRM-396][SC-138882][CONNECT] ReleaseExecute in ExecutePlanResponseReattachableIterator after it gets error from server
[SPARK-41400] [DBRRM-396][SC-138287][CONNECT] Remove Connect Client Catalyst Dependency
[SPARK-44664] [DBRRM-396][PYTHON][CONNECT] Release the execute when closing the iterator in Python client
[SPARK-44631] [DBRRM-396][SC-138823][CONNECT][CORE][14.0.0] Remove session-based directory when the isolated session cache is evicted
[SPARK-42941] [DBRRM-396][SC-138389][SS][CONNECT] Python StreamingQueryListener
[SPARK-44636] [DBRRM-396][SC-138570][CONNECT] Leave no dangling iterators
[SPARK-44424] [DBRRM-396][CONNECT][PYTHON][14.0.0] Python client for reattaching to existing execute in Spark Connect
[SPARK-44637] [SC-138571] Synchronize accesses to ExecuteResponseObserver
[SPARK-44538] [SC-138178][CONNECT][SQL] Reinstate Row.jsonValue and friends
[SPARK-44421] [SC-138434][SPARK-44423][CONNECT] Reattachable execution in Spark Connect
[SPARK-44418] [SC-136807][PYTHON][CONNECT] Upgrade protobuf from 3.19.5 to 3.20.3
[SPARK-44587] [SC-138315][SQL][CONNECT] Increase protobuf marshaller recursion limit
[SPARK-44591] [SC-138292][CONNECT][SQL] Add jobTags to SparkListenerSQLExecutionStart
[SPARK-44610] [SC-138368][SQL] DeduplicateRelations should retain Alias metadata when creating a new instance
[SPARK-44542] [SC-138323][CORE] Eagerly load SparkExitCode class in exception handler
[SPARK-44264] [SC-138143][PYTHON]E2E Testing for Deepspeed
[SPARK-43997] [SC-138347][CONNECT] Add support for Java UDFs
[SPARK-44507] [SQL][CONNECT][14.x][14.0] Move AnalysisException to sql/api
[SPARK-44453] [SC-137013][PYTHON] Use difflib to display errors in assertDataFrameEqual
[SPARK-44394] [SC-138291][CONNECT][WEBUI][14.0] Add a Spark UI page for Spark Connect
[SPARK-44611] [SC-138415][CONNECT] Do not exclude scala-xml
[SPARK-44531] [SC-138044][CONNECT][SQL][14.x][14.0] Move encoder inference to sql/api
[SPARK-43744] [SC-138289][CONNECT][14.x][14.0] Fix class loading problem cau…
[SPARK-44590] [SC-138296][SQL][CONNECT] Remove the arrow batch record limit for SqlCommandResult
[SPARK-43968] [SC-138115][PYTHON] Improve error messages for Python UDTFs with wrong number of outputs
[SPARK-44432] [SC-138293][SS][CONNECT] Terminate streaming queries when a session times out in Spark Connect
[SPARK-44584] [SC-138295][CONNECT] Set client_type information for AddArtifactsRequest and ArtifactStatusesRequest in Scala Client
[SPARK-44552] [14.0][SC-138176][SQL] Remove private object ParseState definition from IntervalUtils
[SPARK-43660] [SC-136183][CONNECT][PS] Enable resample with Spark Connect
[SPARK-44287] [SC-136223][SQL] Use PartitionEvaluator API in RowToColumnarExec & ColumnarToRowExec SQL operators.
[SPARK-39634] [SC-137566][SQL] Allow file splitting in combination with row index generation
[SPARK-44533] [SC-138058][PYTHON] Add support for accumulator, broadcast, and Spark files in Python UDTF’s analyze
[SPARK-44479] [SC-138146][PYTHON] Fix ArrowStreamPandasUDFSerializer to accept no-column pandas DataFrame
[SPARK-44425] [SC-138177][CONNECT] Validate that user provided sessionId is an UUID
[SPARK-44535] [SC-138038][CONNECT][SQL] Move required Streaming API to sql/api
[SPARK-44264] [SC-136523][ML][PYTHON] Write a Deepspeed Distributed Learning Class DeepspeedTorchDistributor
[SPARK-42098] [SC-138164][SQL] Fix ResolveInlineTables can not handle with RuntimeReplaceable expression
[SPARK-44060] [SC-135693][SQL] Code-gen for build side outer shuffled hash join
[SPARK-44496] [SC-137682][SQL][CONNECT] Move Interfaces needed by SCSC to sql/api
[SPARK-44532] [SC-137893][CONNECT][SQL] Move ArrowUtils to sql/api
[SPARK-44413] [SC-137019][PYTHON] Clarify error for unsupported arg data type in assertDataFrameEqual
[SPARK-44530] [SC-138036][CORE][CONNECT] Move SparkBuildInfo to common/util
[SPARK-36612] [SC-133071][SQL] Support left outer join build left or right outer join build right in shuffled hash join
[SPARK-44519] [SC-137728][CONNECT] SparkConnectServerUtils generated incorrect parameters for jars
[SPARK-44449] [SC-137818][CONNECT] Upcasting for direct Arrow Deserialization
[SPARK-44131] [SC-136346][SQL] Add call_function and deprecate call_udf for Scala API
[SPARK-44541] [SQL] Remove useless function hasRangeExprAgainstEventTimeCol from UnsupportedOperationChecker
[SPARK-44523] [SC-137859][SQL] Filter’s maxRows/maxRowsPerPartition is 0 if condition is FalseLiteral
[SPARK-44540] [SC-137873][UI] Remove unused stylesheet and javascript files of jsonFormatter
[SPARK-44466] [SC-137856][SQL] Exclude configs starting with SPARK_DRIVER_PREFIX and SPARK_EXECUTOR_PREFIX from modifiedConfigs
[SPARK-44477] [SC-137508][SQL] Treat TYPE_CHECK_FAILURE_WITH_HINT as an error subclass
[SPARK-44509] [SC-137855][PYTHON][CONNECT] Add job cancellation API set in Spark Connect Python client
[SPARK-44059] [SC-137023] Add analyzer support of named arguments for built-in functions
[SPARK-38476] [SC-136448][CORE] Use error class in org.apache.spark.storage
[SPARK-44486] [SC-137817][PYTHON][CONNECT] Implement PyArrow self_destruct feature for toPandas
[SPARK-44361] [SC-137200][SQL] Use PartitionEvaluator API in MapInBatchExec
[SPARK-44510] [SC-137652][UI] Update dataTables to 1.13.5 and remove some unreached png files
[SPARK-44503] [SC-137808][SQL] Add SQL grammar for PARTITION BY and ORDER BY clause after TABLE arguments for TVF calls
[SPARK-38477] [SC-136319][CORE] Use error class in org.apache.spark.shuffle
[SPARK-44299] [SC-136088][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8]
[SPARK-44422] [SC-137567][CONNECT] Spark Connect fine grained interrupt
[SPARK-44380] [SC-137415][SQL][PYTHON] Support for Python UDTF to analyze in Python
[SPARK-43923] [SC-137020][CONNECT] Post listenerBus events durin…
[SPARK-44303] [SC-136108][SQL] Assign names to the error class LEGACY_ERROR_TEMP[2320-2324]
[SPARK-44294] [SC-135885][UI] Fix HeapHistogram column shows unexpectedly w/ select-all-box
[SPARK-44409] [SC-136975][SQL] Handle char/varchar in Dataset.to to keep consistent with others
[SPARK-44334] [SC-136576][SQL][UI] Status in the REST API response for a failed DDL/DML with no jobs should be FAILED rather than COMPLETED
[SPARK-42309] [SC-136703][SQL] Introduce INCOMPATIBLE_DATA_TO_TABLE and sub classes.
[SPARK-44367] [SC-137418][SQL][UI] Show error message on UI for each failed query
[SPARK-44474] [SC-137195][CONNECT] Reenable “Test observe response” at SparkConnectServiceSuite
[SPARK-44320] [SC-136446][SQL] Assign names to the error class LEGACY_ERROR_TEMP[1067,1150,1220,1265,1277]
[SPARK-44310] [SC-136055][CONNECT] The Connect Server startup log should display the hostname and port
[SPARK-44309] [SC-136193][UI] Display Add/Remove Time of Executors on Executors Tab
[SPARK-42898] [SC-137556][SQL] Mark that string/date casts do not need time zone id
[SPARK-44475] [SC-137422][SQL][CONNECT] Relocate DataType and Parser to sql/api
[SPARK-44484] [SC-137562][SS]Add batchDuration to StreamingQueryProgress json method
[SPARK-43966] [SC-137559][SQL][PYTHON] Support non-deterministic table-valued functions
[SPARK-44439] [SC-136973][CONNECT][SS]Fixed listListeners to only send ids back to client
[SPARK-44341] [SC-137054][SQL][PYTHON] Define the computing logic through PartitionEvaluator API and use it in WindowExec and WindowInPandasExec
[SPARK-43839] [SC-132680][SQL] Convert _LEGACY_ERROR_TEMP_1337 to UNSUPPORTED_FEATURE.TIME_TRAVEL
[SPARK-44244] [SC-135703][SQL] Assign names to the error class LEGACY_ERROR_TEMP[2305-2309]
[SPARK-44201] [SC-136778][CONNECT][SS]Add support for Streaming Listener in Scala for Spark Connect
[SPARK-44260] [SC-135618][SQL] Assign names to the error class LEGACY_ERROR_TEMP[1215-1245-2329] & Use checkError() to check Exception in _CharVarchar_Suite
[SPARK-42454] [SC-136913][SQL] SPJ: encapsulate all SPJ related parameters in BatchScanExec
[SPARK-44292] [SC-135844][SQL] Assign names to the error class LEGACY_ERROR_TEMP[2315-2319]
[SPARK-44396] [SC-137221][Connect] Direct Arrow Deserialization
[SPARK-44324] [SC-137172][SQL][CONNECT] Move CaseInsensitiveMap to sql/api
[SPARK-44395] [SC-136744][SQL] Add test back to StreamingTableSuite
[SPARK-44481] [SC-137401][CONNECT][PYTHON] Make pyspark.sql.is_remote an API
[SPARK-44278] [SC-137400][CONNECT] Implement a GRPC server interceptor that cleans up thread local properties
[SPARK-44264] [SC-137211][ML][PYTHON] Support Distributed Training of Functions Using Deepspeed
[SPARK-44430] [SC-136970][SQL] Add cause to AnalysisException when option is invalid
[SPARK-44264] [SC-137167][ML][PYTHON] Incorporating FunctionPickler Into TorchDistributor
[SPARK-44216] [SC-137046] [PYTHON] Make assertSchemaEqual API public
[SPARK-44398] [SC-136720][CONNECT] Scala foreachBatch API
[SPARK-43203] [SC-134528][SQL] Move all Drop Table case to DataSource V2
[SPARK-43755] [SC-137171][CONNECT][MINOR] Open AdaptiveSparkPlanHelper.allChildren instead of using copy in MetricGenerator
[SPARK-44264] [SC-137187][ML][PYTHON] Refactoring TorchDistributor To Allow for Custom “run_training_on_file” Function Pointer
[SPARK-43755] [SC-136838][CONNECT] Move execution out of SparkExecutePlanStreamHandler and to a different thread
[SPARK-44411] [SC-137198][SQL] Use PartitionEvaluator API in ArrowEvalPythonExec and BatchEvalPythonExec
[SPARK-44375] [SC-137197][SQL] Use PartitionEvaluator API in DebugExec
[SPARK-43967] [SC-137057][PYTHON] Support regular Python UDTFs with empty return values
[SPARK-43915] [SC-134766][SQL] Assign names to the error class LEGACY_ERROR_TEMP[2438-2445]
[SPARK-43965] [SC-136929][PYTHON][CONNECT] Support Python UDTF in Spark Connect
[SPARK-44154] [SC-137050][SQL] Added more unit tests to BitmapExpressionUtilsSuite and made minor improvements to Bitmap Aggregate Expressions
[SPARK-44169] [SC-135497][SQL] Assign names to the error class LEGACY_ERROR_TEMP[2300-2304]
[SPARK-44353] [SC-136578][CONNECT][SQL] Remove StructType.toAttributes
[SPARK-43964] [SC-136676][SQL][PYTHON] Support arrow-optimized Python UDTFs
[SPARK-44321] [SC-136308][CONNECT] Decouple ParseException from AnalysisException
[SPARK-44348] [SAS-1910][SC-136644][CORE][CONNECT][PYTHON] Reenable test_artifact with relevant changes
[SPARK-44145] [SC-136698][SQL] Callback when ready for execution
[SPARK-43983] [SC-136404][PYTHON][ML][CONNECT] Enable cross validator estimator test
[SPARK-44399] [SC-136669][PYHTON][CONNECT] Import SparkSession in Python UDF only when useArrow is None
[SPARK-43631] [SC-135300][CONNECT][PS] Enable Series.interpolate with Spark Connect
[SPARK-44374] [SC-136544][PYTHON][ML] Add example code for distributed ML for spark connect
[SPARK-44282] [SC-135948][CONNECT] Prepare DataType parsing for use in Spark Connect Scala Client
[SPARK-44052] [SC-134469][CONNECT][PS] Add util to get proper Column or DataFrame class for Spark Connect.
[SPARK-43983] [SC-136404][PYTHON][ML][CONNECT] Implement cross validator estimator
[SPARK-44290] [SC-136300][CONNECT] Session-based files and archives in Spark Connect
[SPARK-43710] [SC-134860][PS][CONNECT] Support functions.date_part for Spark Connect
[SPARK-44036] [SC-134036][CONNECT][PS] Cleanup & consolidate tickets to simplify the tasks.
[SPARK-44150] [SC-135790][PYTHON][CONNECT] Explicit Arrow casting for mismatched return type in Arrow Python UDF
[SPARK-43903] [SC-134754][PYTHON][CONNECT] Improve ArrayType input support in Arrow Python UDF
[SPARK-44250] [SC-135819][ML][PYTHON][CONNECT] Implement classification evaluator
[SPARK-44255] [SC-135704][SQL] Relocate StorageLevel to common/utils
[SPARK-42169] [SC-135735] [SQL] Implement code generation for to_csv function (StructsToCsv)
[SPARK-44249] [SC-135719][SQL][PYTHON] Refactor PythonUDTFRunner to send its return type separately
[SPARK-43353] [SC-132734][PYTHON] Migrate remaining session errors into error class
[SPARK-44133] [SC-134795][PYTHON] Upgrade MyPy from 0.920 to 0.982
[SPARK-42941] [SC-134707][SS][CONNECT][1/2] StreamingQueryListener - Event Serde in JSON format
[SPARK-43353] Revert “[SC-132734][ES-729763][PYTHON] Migrate remaining session errors into error class”
[SPARK-44100] [SC-134576][ML][CONNECT][PYTHON] Move namespace from pyspark.mlv2 to pyspark.ml.connect
[SPARK-44220] [SC-135484][SQL] Move StringConcat to sql/api
[SPARK-43992] [SC-133645][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listFunctions
[SPARK-43982] [SC-134529][ML][PYTHON][CONNECT] Implement pipeline estimator for ML on spark connect
[SPARK-43888] [SC-132893][CORE] Relocate Logging to common/utils
[SPARK-42941] Revert “[SC-134707][SS][CONNECT][1/2] StreamingQueryListener - Event Serde in JSON format”
[SPARK-43624] [SC-134557][PS][CONNECT] Add EWM to SparkConnectPlanner.
[SPARK-43981] [SC-134137][PYTHON][ML] Basic saving / loading implementation for ML on spark connect
[SPARK-43205] [SC-133371][SQL] fix SQLQueryTestSuite
[SPARK-43376] Revert “[SC-130433][SQL] Improve reuse subquery with table cache”
[SPARK-44040] [SC-134366][SQL] Fix compute stats when AggregateExec node above QueryStageExec
[SPARK-43919] [SC-133374][SQL] Extract JSON functionality out of Row
[SPARK-42618] [SC-134433][PYTHON][PS] Warning for the pandas-related behavior changes in next major release
[SPARK-43893] [SC-133381][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF
[SPARK-43627] [SC-134290][SPARK-43626][PS][CONNECT] Enable pyspark.pandas.spark.functions.{kurt, skew} in Spark Connect.
[SPARK-43798] [SC-133990][SQL][PYTHON] Support Python user-defined table functions
[SPARK-43616] [SC-133849][PS][CONNECT] Enable pyspark.pandas.spark.functions.mode in Spark Connect
[SPARK-43133] [SC-133728] Scala Client DataStreamWriter Foreach support
[SPARK-43684] [SC-134107][SPARK-43685][SPARK-43686][SPARK-43691][CONNECT][PS] Fix (NullOps|NumOps).(eq|ne) for Spark Connect.
[SPARK-43645] [SC-134151][SPARK-43622][PS][CONNECT] Enable pyspark.pandas.spark.functions.{var, stddev} in Spark Connect
[SPARK-43617] [SC-133893][PS][CONNECT] Enable pyspark.pandas.spark.functions.product in Spark Connect
[SPARK-43610] [SC-133832][CONNECT][PS] Enable InternalFrame.attach_distributed_column in Spark Connect.
[SPARK-43621] [SC-133852][PS][CONNECT] Enable pyspark.pandas.spark.functions.repeat in Spark Connect
[SPARK-43921] [SC-133461][PROTOBUF] Generate Protobuf descriptor files at build time
[SPARK-43613] [SC-133727][PS][CONNECT] Enable pyspark.pandas.spark.functions.covar in Spark Connect
[SPARK-43376] [SC-130433][SQL] Improve reuse subquery with table cache
[SPARK-43612] [SC-132011][CONNECT][PYTHON] Implement SparkSession.addArtifact(s) in Python client
[SPARK-43920] [SC-133611][SQL][CONNECT] Create sql/api module
[SPARK-43097] [SC-133372][ML] New pyspark ML logistic regression estimator implemented on top of distributor
[SPARK-43783] [SC-133240][SPARK-43784][SPARK-43788][ML] Make MLv2 (ML on spark connect) supports pandas >= 2.0
[SPARK-43024] [SC-132716][PYTHON] Upgrade pandas to 2.0.0
[SPARK-43881] [SC-133140][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listDatabases
[SPARK-39281] [SC-131422][SQL] Speed up Timestamp type inference with legacy format in JSON/CSV data source
[SPARK-43792] [SC-132887][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listCatalogs
[SPARK-43132] [SC-131623] [SS] [CONNECT] Python Client DataStreamWriter foreach() API
[SPARK-43545] [SC-132378][SQL][PYTHON] Support nested timestamp type
[SPARK-43353] [SC-132734][PYTHON] Migrate remaining session errors into error class
[SPARK-43304] [SC-129969][CONNECT][PYTHON] Migrate NotImplementedError into PySparkNotImplementedError
[SPARK-43516] [SC-132202][ML][PYTHON][CONNECT] Base interfaces of sparkML for spark3.5: estimator/transformer/model/evaluator
[SPARK-43128] Revert “[SC-131628][CONNECT][SS] Make recentProgress and lastProgress return StreamingQueryProgress consistent with the native Scala Api”
[SPARK-43543] [SC-131839][PYTHON] Fix nested MapType behavior in Pandas UDF
[SPARK-38469] [SC-131425][CORE] Use error class in org.apache.spark.network
[SPARK-43309] [SC-129746][SPARK-38461][CORE] Extend INTERNAL_ERROR with categories and add error class INTERNAL_ERROR_BROADCAST
[SPARK-43265] [SC-129653] Move Error framework to a common utils module
[SPARK-43440] [SC-131229][PYTHON][CONNECT] Support registration of an Arrow-optimized Python UDF
[SPARK-43528] [SC-131531][SQL][PYTHON] Support duplicated field names in createDataFrame with pandas DataFrame
[SPARK-43412] [SC-130990][PYTHON][CONNECT] Introduce SQL_ARROW_BATCHED_UDF EvalType for Arrow-optimized Python UDFs
[SPARK-40912] [SC-130986][CORE]Overhead of Exceptions in KryoDeserializationStream
[SPARK-39280] [SC-131206][SQL] Speed up Timestamp type inference with user-provided format in JSON/CSV data source
[SPARK-43473] [SC-131372][PYTHON] Support struct type in createDataFrame from pandas DataFrame
[SPARK-43443] [SC-131024][SQL] Add benchmark for Timestamp type inference when use invalid value
[SPARK-41532] [SC-130523][CONNECT][CLIENT] Add check for operations that involve multiple data frames
[SPARK-43296] [SC-130627][CONNECT][PYTHON] Migrate Spark Connect session errors into error class
[SPARK-43324] [SC-130455][SQL] Handle UPDATE commands for delta-based sources
[SPARK-43347] [SC-130148][PYTHON] Remove Python 3.7 Support
[SPARK-43292] [SC-130525][CORE][CONNECT] Move ExecutorClassLoader to core module and simplify Executor#addReplClassLoaderIfNeeded
[SPARK-43081] [SC-129900] [ML] [CONNECT] Add torch distributor data loader that loads data from spark partition data
[SPARK-43331] [SC-130061][CONNECT] Add Spark Connect SparkSession.interruptAll
[SPARK-43306] [SC-130320][PYTHON] Migrate ValueError from Spark SQL types into error class
[SPARK-43261] [SC-129674][PYTHON] Migrate TypeError from Spark SQL types into error class.
[SPARK-42992] [SC-129465][PYTHON] Introduce PySparkRuntimeError
[SPARK-16484] [SC-129975][SQL] Add support for Datasketches HllSketch
[SPARK-43165] [SC-128823][SQL] Move canWrite to DataTypeUtils
[SPARK-43082] [SC-129112][CONNECT][PYTHON] Arrow-optimized Python UDFs in Spark Connect
[SPARK-43084] [SC-128654] [SS] Add applyInPandasWithState support for spark connect
[SPARK-42657] [SC-128621][CONNECT] Support to find and transfer client-side REPL classfiles to server as artifacts
[SPARK-43098] [SC-77059][SQL] Fix correctness COUNT bug when scalar subquery has group by clause
[SPARK-42884] [SC-126662][CONNECT] Add Ammonite REPL integration
[SPARK-42994] [SC-128333][ML][CONNECT] PyTorch Distributor support Local Mode
[SPARK-41498] [SC-125343]Revert ” Propagate metadata through Union”
[SPARK-42993] [SC-127829][ML][CONNECT] Make PyTorch Distributor compatible with Spark Connect
[SPARK-42683] [LC-75] Automatically rename conflicting metadata columns
[SPARK-42874] [SC-126442][SQL] Enable new golden file test framework for analysis for all input files
[SPARK-42779] [SC-126042][SQL] Allow V2 writes to indicate advisory shuffle partition size
[SPARK-42891] [SC-126458][CONNECT][PYTHON] Implement CoGrouped Map API
[SPARK-42791] [SC-126134][SQL] Create a new golden file test framework for analysis
[SPARK-42615] [SC-124237][CONNECT][PYTHON] Refactor the AnalyzePlan RPC and add session.version
[SPARK-41302] Revert “[ALL TESTS][SC-122423][SQL] Assign name to _LEGACY_ERROR_TEMP_1185”
[SPARK-40770] [SC-122652][PYTHON] Improved error messages for applyInPandas for schema mismatch
[SPARK-40770] Revert “[ALL TESTS][SC-122652][PYTHON] Improved error messages for applyInPandas for schema mismatch”
[SPARK-42398] [SC-123500][SQL] Refine default column value DS v2 interface
[SPARK-40770] [ALL TESTS][SC-122652][PYTHON] Improved error messages for applyInPandas for schema mismatch
[SPARK-40770] Revert “[SC-122652][PYTHON] Improved error messages for applyInPandas for schema mismatch”
[SPARK-40770] [SC-122652][PYTHON] Improved error messages for applyInPandas for schema mismatch
[SPARK-42038] [ALL TESTS] Revert “Revert “[SC-122533][SQL] SPJ: Support partially clustered distribution””
[SPARK-42038] Revert “[SC-122533][SQL] SPJ: Support partially clustered distribution”
[SPARK-42038] [SC-122533][SQL] SPJ: Support partially clustered distribution
[SPARK-40550] [SC-120989][SQL] DataSource V2: Handle DELETE commands for delta-based sources
[SPARK-40770] Revert “[SC-122652][PYTHON] Improved error messages for applyInPandas for schema mismatch”
[SPARK-40770] [SC-122652][PYTHON] Improved error messages for applyInPandas for schema mismatch
[SPARK-41302] Revert “[SC-122423][SQL] Assign name to _LEGACY_ERROR_TEMP_1185”
[SPARK-40550] Revert “[SC-120989][SQL] DataSource V2: Handle DELETE commands for delta-based sources”
[SPARK-42123] Revert “[SC-121453][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output”
[SPARK-42146] [SC-121172][CORE] Refactor Utils#setStringField to make maven build pass when sql module use this method
[SPARK-42119] Revert “[SC-121342][SQL] Add built-in table-valued functions inline and inline_outer”

Highlights

Fix aes_decrypt and ln functions in Connect SPARK-45109
Fix inherited named tuples to work in createDataFrame SPARK-44980
CodeGenerator Cache is now classloader-specific [SPARK-44795]
Added SparkListenerConnectOperationStarted.planRequest [SPARK-44861]
Make Streaming Queries work with Connect’s artifact management [SPARK-44794]
ArrowDeserializer works with REPL generated classes [SPARK-44791]
Fixed Arrow-optimized Python UDF on Spark Connect [SPARK-44876]
Scala and Go client support in Spark Connect SPARK-42554 SPARK-43351
PyTorch-based distributed ML Support for Spark Connect SPARK-42471
Structured Streaming support for Spark Connect in Python and Scala SPARK-42938
Pandas API support for the Python Spark Connect Client SPARK-42497
Introduce Arrow Python UDFs SPARK-40307
Support Python user-defined table functions SPARK-43798
Migrate PySpark errors onto error classes SPARK-42986
PySpark Test Framework SPARK-44042
Add support for Datasketches HllSketch SPARK-16484
Built-in SQL Function Improvement SPARK-41231
IDENTIFIER clause SPARK-43205
Add SQL functions into Scala, Python and R API SPARK-43907
Add named argument support for SQL functions SPARK-43922
Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated SPARK-41469
Distributed ML <> spark connect SPARK-42471
DeepSpeed Distributor SPARK-44264
Implement changelog checkpointing for RocksDB state store SPARK-43421
Introduce watermark propagation among operators SPARK-42376
Introduce dropDuplicatesWithinWatermark SPARK-42931
RocksDB state store provider memory management enhancements SPARK-43311

Spark Connect

Refactoring of the sql module into sql and sql-api to produce a minimum set of dependencies that can be shared between the Scala Spark Connect client and Spark and avoids pulling all of the Spark transitive dependencies. SPARK-44273
Introducing the Scala client for Spark Connect SPARK-42554
Pandas API support for the Python Spark Connect Client SPARK-42497
PyTorch-based distributed ML Support for Spark Connect SPARK-42471
Structured Streaming support for Spark Connect in Python and Scala SPARK-42938
Initial version of the Go client SPARK-43351
Lot’s of compatibility improvements between Spark native and the Spark Connect clients across Python and Scala
Improved debugability and request handling for client applications (asynchronous processing, retries, long-lived queries)

Spark SQL

Features

Add metadata column file block start and length SPARK-42423
Support positional parameters in Scala/Java sql() SPARK-44066
Add named parameter support in parser for function calls SPARK-43922
Support SELECT DEFAULT with ORDER BY, LIMIT, OFFSET for INSERT source relation SPARK-43071
Add SQL grammar for PARTITION BY and ORDER BY clause after TABLE arguments for TVF calls SPARK-44503
Include column default values in DESCRIBE and SHOW CREATE TABLE output SPARK-42123
Add optional pattern for Catalog.listCatalogs SPARK-43792
Add optional pattern for Catalog.listDatabases SPARK-43881
Callback when ready for execution SPARK-44145
Support Insert By Name statement SPARK-42750
Add call_function for Scala API SPARK-44131
Stable derived column aliases SPARK-40822
Support general constant expressions as CREATE/REPLACE TABLE OPTIONS values SPARK-43529
Support subqueries with correlation through INTERSECT/EXCEPT SPARK-36124
IDENTIFIER clause SPARK-43205
ANSI MODE: Conv should return an error if the internal conversion overflows SPARK-42427

Functions

Add support for Datasketches HllSketch SPARK-16484
Support the CBC mode by aes_encrypt()/aes_decrypt() SPARK-43038
Support TABLE argument parser rule for TableValuedFunction SPARK-44200
Implement bitmap functions SPARK-44154
Add the try_aes_decrypt() function SPARK-42701
array_insert should fail with 0 index SPARK-43011
Add to_varchar alias for to_char SPARK-43815
High-order function: array_compact implementation SPARK-41235
Add analyzer support of named arguments for built-in functions SPARK-44059
Add NULLs for INSERTs with user-specified lists of fewer columns than the target table SPARK-42521
Adds support for aes_encrypt IVs and AAD SPARK-43290
DECODE function returns wrong results when passed NULL SPARK-41668
Support udf ‘luhn_check’ SPARK-42191
Support implicit lateral column alias resolution on Aggregate SPARK-41631
Support implicit lateral column alias in queries with Window SPARK-42217
Add 3-args function aliases DATE_ADD and DATE_DIFF SPARK-43492

Data Sources

Char/Varchar Support for JDBC Catalog SPARK-42904
Support Get SQL Keywords Dynamically Thru JDBC API and TVF SPARK-43119
DataSource V2: Handle MERGE commands for delta-based sources SPARK-43885
DataSource V2: Handle MERGE commands for group-based sources SPARK-43963
DataSource V2: Handle UPDATE commands for group-based sources SPARK-43975
DataSource V2: Allow representing updates as deletes and inserts SPARK-43775
Allow jdbc dialects to override the query used to create a table SPARK-41516
SPJ: Support partially clustered distribution SPARK-42038
DSv2 allows CTAS/RTAS to reserve schema nullability SPARK-43390
Add spark.sql.files.maxPartitionNum SPARK-44021
Handle UPDATE commands for delta-based sources SPARK-43324
Allow V2 writes to indicate advisory shuffle partition size SPARK-42779
Support lz4raw compression codec for Parquet SPARK-43273
Avro: writing complex unions SPARK-25050
Speed up Timestamp type inference with user-provided format in JSON/CSV data source SPARK-39280
Avro to Support custom decimal type backed by Long SPARK-43901
Avoid shuffle in Storage-Partitioned Join when partition keys mismatch, but join expressions are compatible SPARK-41413
Change binary to unsupported dataType in CSV format SPARK-42237
Allow Avro to convert union type to SQL with field name stable with type SPARK-43333
Speed up Timestamp type inference with legacy format in JSON/CSV data source SPARK-39281

Query Optimization

Subexpression elimination support shortcut expression SPARK-42815
Improve join stats estimation if one side can keep uniqueness SPARK-39851
Introduce the group limit of Window for rank-based filter to optimize top-k computation SPARK-37099
Fix behavior of null IN (empty list) in optimization rules SPARK-44431
Infer and push down window limit through window if partitionSpec is empty SPARK-41171
Remove the outer join if they are all distinct aggregate functions SPARK-42583
Collapse two adjacent windows with the same partition/order in subquery SPARK-42525
Push down limit through Python UDFs SPARK-42115
Optimize the order of filtering predicates SPARK-40045

Code Generation and Query Execution

Runtime filter should supports multi level shuffle join side as filter creation side SPARK-41674
Codegen Support for HiveSimpleUDF SPARK-42052
Codegen Support for HiveGenericUDF SPARK-42051
Codegen Support for build side outer shuffled hash join SPARK-44060
Implement code generation for to_csv function (StructsToCsv) SPARK-42169
Make AQE support InMemoryTableScanExec SPARK-42101
Support left outer join build left or right outer join build right in shuffled hash join SPARK-36612
Respect RequiresDistributionAndOrdering in CTAS/RTAS SPARK-43088
Coalesce buckets in join applied on broadcast join stream side SPARK-43107
Set nullable correctly on coalesced join key in full outer USING join SPARK-44251
Fix IN subquery ListQuery nullability SPARK-43413

Other Notable Changes

Set nullable correctly for keys in USING joins SPARK-43718
Fix COUNT(*) is null bug in correlated scalar subquery SPARK-43156
Dataframe.joinWith outer-join should return a null value for unmatched row SPARK-37829
Automatically rename conflicting metadata columns SPARK-42683
Document the Spark SQL error classes in user-facing documentation SPARK-42706

PySpark

Features

Support positional parameters in Python sql() SPARK-44140
Support parameterized SQL by sql() SPARK-41666
Support Python user-defined table functions SPARK-43797
Support to set Python executable for UDF and pandas function APIs in workers during runtime SPARK-43574
Add DataFrame.offset to PySpark SPARK-43213
Implement dir() in pyspark.sql.dataframe.DataFrame to include columns SPARK-43270
Add option to use large variable width vectors for arrow UDF operations SPARK-39979
Make mapInPandas / mapInArrow support barrier mode execution SPARK-42896
Add JobTag APIs to PySpark SparkContext SPARK-44194
Support for Python UDTF to analyze in Python SPARK-44380
Expose TimestampNTZType in pyspark.sql.types SPARK-43759
Support nested timestamp type SPARK-43545
Support UserDefinedType in createDataFrame from pandas DataFrame and toPandas [SPARK-43817][SPARK-43702]https://issues.apache.org/jira/browse/SPARK-43702)
Add descriptor binary option to Pyspark Protobuf API SPARK-43799
Accept generics tuple as typing hints of Pandas UDF SPARK-43886
Add array_prepend function SPARK-41233
Add assertDataFrameEqual util function SPARK-44061
Support arrow-optimized Python UDTFs SPARK-43964
Allow custom precision for fp approx equality SPARK-44217
Make assertSchemaEqual API public SPARK-44216
Support fill_value for ps.Series SPARK-42094
Support struct type in createDataFrame from pandas DataFrame SPARK-43473

Other Notable Changes

Add autocomplete support for df[|] in pyspark.sql.dataframe.DataFrame [SPARK-43892]
Deprecate & remove the APIs that will be removed in pandas 2.0 [SPARK-42593]
Make Python the first tab for code examples - Spark SQL, DataFrames and Datasets Guide SPARK-42493
Updating remaining Spark documentation code examples to show Python by default SPARK-42642
Use deduplicated field names when creating Arrow RecordBatch [SPARK-41971]
Support duplicated field names in createDataFrame with pandas DataFrame [SPARK-43528]
Allow columns parameter when creating DataFrame with Series [SPARK-42194]

Core

Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks SPARK-40082
Introduce PartitionEvaluator for SQL operator execution SPARK-43061
Allow ShuffleDriverComponent to declare if shuffle data is reliably stored SPARK-42689
Add max attempts limitation for stages to avoid potential infinite retry SPARK-42577
Support log level configuration with static Spark conf SPARK-43782
Optimize PercentileHeap SPARK-42528
Add reason argument to TaskScheduler.cancelTasks SPARK-42602
Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated SPARK-41469
Fixing accumulator undercount in the case of the retry task with rdd cache SPARK-41497
Use RocksDB for spark.history.store.hybridStore.diskBackend by default SPARK-42277
NonFateSharingCache wrapper for Guava Cache SPARK-43300
Improve the performance of MapOutputTracker.updateMapOutput SPARK-43043
Allowing apps to control whether their metadata gets saved in the db by the External Shuffle Service SPARK-43179
Add SPARK_DRIVER_POD_IP env variable to executor pods SPARK-42769
Mounts the hadoop config map on the executor pod SPARK-43504

Structured Streaming

Add support for tracking pinned blocks memory usage for RocksDB state store SPARK-43120
Add RocksDB state store provider memory management enhancements SPARK-43311
Introduce dropDuplicatesWithinWatermark SPARK-42931
Introduce a new callback onQueryIdle() to StreamingQueryListener SPARK-43183
Add option to skip commit coordinator as part of StreamingWrite API for DSv2 sources/sinks SPARK-42968
Introduce a new callback “onQueryIdle” to StreamingQueryListener SPARK-43183
Implement Changelog based Checkpointing for RocksDB State Store Provider SPARK-43421
Add support for WRITE_FLUSH_BYTES for RocksDB used in streaming stateful operators SPARK-42792
Add support for setting max_write_buffer_number and write_buffer_size for RocksDB used in streaming SPARK-42819
RocksDB StateStore lock acquisition should happen after getting input iterator from inputRDD SPARK-42566
Introduce watermark propagation among operators SPARK-42376
Cleanup orphan sst and log files in RocksDB checkpoint directory SPARK-42353
Expand QueryTerminatedEvent to contain error class if it exists in exception SPARK-43482

ML

Support Distributed Training of Functions Using Deepspeed SPARK-44264
Base interfaces of sparkML for spark3.5: estimator/transformer/model/evaluator SPARK-43516
Make MLv2 (ML on spark connect) supports pandas >= 2.0 SPARK-43783
Update MLv2 Transformer interfaces SPARK-43516
New pyspark ML logistic regression estimator implemented on top of distributor SPARK-43097
Add Classifier.getNumClasses back SPARK-42526
Write a Deepspeed Distributed Learning Class DeepspeedTorchDistributor SPARK-44264
Basic saving / loading implementation for ML on spark connect SPARK-43981
Improve logistic regression model saving SPARK-43097
Implement pipeline estimator for ML on spark connect SPARK-43982
Implement cross validator estimator SPARK-43983
Implement classification evaluator SPARK-44250
Make PyTorch Distributor compatible with Spark Connect SPARK-42993

UI

Add a Spark UI page for Spark Connect SPARK-44394
Support Heap Histogram column in Executors tab SPARK-44153
Show error message on UI for each failed query SPARK-44367
Display Add/Remove Time of Executors on Executors Tab SPARK-44309

Build and Others

Remove Python 3.7 Support SPARK-43347
Increate PyArrow minimum version to 4.0.0 SPARK-44183
Support R 4.3.1 SPARK-43447 SPARK-44192
Add JobTag APIs to SparkR SparkContext SPARK-44195
Add math functions to SparkR SPARK-44349
Upgrade Parquet to 1.13.1 SPARK-43519
Upgrade ASM to 9.5 SPARK-43537 SPARK-43588
Upgrade rocksdbjni to 8.3.2 SPARK-41569 SPARK-42718 SPARK-43007 SPARK-43436 SPARK-44256
Upgrade Netty to 4.1.93 SPARK-42218 SPARK-42417 SPARK-42487 SPARK-43609 SPARK-44128
Upgrade zstd-jni to 1.5.5-5 SPARK-42409 SPARK-42625 SPARK-43080 SPARK-43294 SPARK-43737 SPARK-43994 SPARK-44465
Upgrade dropwizard metrics 4.2.19 SPARK-42654 SPARK-43738 SPARK-44296
Upgrade gcs-connector to 2.2.14 SPARK-42888 SPARK-43842
Upgrade commons-crypto to 1.2.0 SPARK-42488
Upgrade scala-parser-combinators from 2.1.1 to 2.2.0 SPARK-42489
Upgrade protobuf-java to 3.23.4 SPARK-41711 SPARK-42490 SPARK-42798 SPARK-43899 SPARK-44382
Upgrade commons-codec to 1.16.0 SPARK-44151
Upgrade Apache Kafka to 3.4.1 SPARK-42396 SPARK-44181
Upgrade RoaringBitmap to 0.9.45 SPARK-42385 SPARK-43495 SPARK-44221
Update ORC to 1.9.0 SPARK-42820 SPARK-44053 SPARK-44231
Upgrade to Avro 1.11.2 SPARK-44277
Upgrade commons-compress to 1.23.0 SPARK-43102
Upgrade joda-time from 2.12.2 to 2.12.5 SPARK-43008
Upgrade snappy-java to 1.1.10.3 SPARK-42242 SPARK-43758 SPARK-44070 SPARK-44415 SPARK-44513
Upgrade mysql-connector-java from 8.0.31 to 8.0.32 SPARK-42717
Upgrade Apache Arrow to 12.0.1 SPARK-42161 SPARK-43446 SPARK-44094
Upgrade commons-io to 2.12.0 SPARK-43739
Upgrade Apache commons-io to 2.13.0 SPARK-43739 SPARK-44028
Upgrade FasterXML jackson to 2.15.2 SPARK-42354 SPARK-43774 SPARK-43904
Upgrade log4j2 to 2.20.0 SPARK-42536
Upgrade slf4j to 2.0.7 SPARK-42871
Upgrade numpy and pandas in the release Dockerfile SPARK-42524
Upgrade Jersey to 2.40 SPARK-44316
Upgrade H2 from 2.1.214 to 2.2.220 SPARK-44393
Upgrade optionator to ^0.9.3 SPARK-44279
Upgrade bcprov-jdk15on and bcpkix-jdk15on to 1.70 SPARK-44441
Upgrade mlflow to 2.3.1 SPARK-43344
Upgrade Tink to 1.9.0 SPARK-42780
Upgrade silencer to 1.7.13 SPARK-41787 SPARK-44031
Upgrade Ammonite to 2.5.9 SPARK-44041
Upgrade Scala to 2.12.18 SPARK-43832
Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7 SPARK-41587
Upgrade minimatch to 3.1.2 SPARK-41634
Upgrade sbt-assembly from 2.0.0 to 2.1.0 SPARK-41704
Update maven-checkstyle-plugin from 3.1.2 to 3.2.0 SPARK-41714
Upgrade dev.ludovic.netlib to 3.0.3 SPARK-41750
Upgrade hive-storage-api to 2.8.1 SPARK-41798
Upgrade Apache httpcore to 4.4.16 SPARK-41802
Upgrade jetty to 9.4.52.v20230823 SPARK-45052
Upgrade compress-lzf to 1.1.2 SPARK-42274

Removals, Behavior Changes and Deprecations

Upcoming Removal

The following features will be removed in the next Spark major release

Support for Java 8 and Java 11, and the minimal supported Java version will be Java 17
Support for Scala 2.12, and the minimal supported Scala version will be 2.13

Migration Guides

Databricks ODBC/JDBC driver support

Databricks supports ODBC/JDBC drivers released in the past 2 years. Please download the recently released drivers and upgrade (download ODBC, download JDBC).

System environment

Operating System: Ubuntu 22.04.3 LTS
Java: Zulu 8.70.0.23-CA-linux64
Scala: 2.12.15
Python: 3.10.12
R: 4.3.1
Delta Lake: 2.4.0

Installed Python libraries

Library	Version	Library	Version	Library	Version
anyio	3.5.0	argon2-cffi	21.3.0	argon2-cffi-bindings	21.2.0
asttokens	2.0.5	attrs	22.1.0	backcall	0.2.0
beautifulsoup4	4.11.1	black	22.6.0	bleach	4.1.0
blinker	1.4	boto3	1.24.28	botocore	1.27.96
certifi	2022.12.7	cffi	1.15.1	chardet	4.0.0
charset-normalizer	2.0.4	click	8.0.4	comm	0.1.2
contourpy	1.0.5	cryptography	39.0.1	cycler	0.11.0
Cython	0.29.32	databricks-sdk	0.1.6	dbus-python	1.2.18
debugpy	1.6.7	decorator	5.1.1	defusedxml	0.7.1
distlib	0.3.7	docstring-to-markdown	0.11	entrypoints	0.4
executing	0.8.3	facets-overview	1.1.1	fastjsonschema	2.18.0
filelock	3.12.2	fonttools	4.25.0	GCC runtime library	1.10.0
googleapis-common-protos	1.60.0	grpcio	1.48.2	grpcio-status	1.48.1
httplib2	0.20.2	idna	3.4	importlib-metadata	4.6.4
ipykernel	6.25.0	ipython	8.14.0	ipython-genutils	0.2.0
ipywidgets	7.7.2	jedi	0.18.1	jeepney	0.7.1
Jinja2	3.1.2	jmespath	0.10.0	joblib	1.2.0
jsonschema	4.17.3	jupyter-client	7.3.4	jupyter-server	1.23.4
jupyter_core	5.2.0	jupyterlab-pygments	0.1.2	jupyterlab-widgets	1.0.0
keyring	23.5.0	kiwisolver	1.4.4	launchpadlib	1.10.16
lazr.restfulclient	0.14.4	lazr.uri	1.0.6	lxml	4.9.1
MarkupSafe	2.1.1	matplotlib	3.7.0	matplotlib-inline	0.1.6
mccabe	0.7.0	mistune	0.8.4	more-itertools	8.10.0
mypy-extensions	0.4.3	nbclassic	0.5.2	nbclient	0.5.13
nbconvert	6.5.4	nbformat	5.7.0	nest-asyncio	1.5.6
nodeenv	1.8.0	notebook	6.5.2	notebook_shim	0.2.2
numpy	1.23.5	oauthlib	3.2.0	packaging	22.0
pandas	1.5.3	pandocfilters	1.5.0	parso	0.8.3
pathspec	0.10.3	patsy	0.5.3	pexpect	4.8.0
pickleshare	0.7.5	Pillow	9.4.0	pip	22.3.1
platformdirs	2.5.2	plotly	5.9.0	pluggy	1.0.0
prometheus-client	0.14.1	prompt-toolkit	3.0.36	protobuf	4.24.0
psutil	5.9.0	psycopg2	2.9.3	ptyprocess	0.7.0
pure-eval	0.2.2	pyarrow	8.0.0	pycparser	2.21
pydantic	1.10.6	pyflakes	3.0.1	Pygments	2.11.2
PyGObject	3.42.1	PyJWT	2.3.0	pyodbc	4.0.32
pyparsing	3.0.9	pyright	1.1.294	pyrsistent	0.18.0
python-dateutil	2.8.2	python-lsp-jsonrpc	1.0.0	python-lsp-server	1.7.1
pytoolconfig	1.2.5	pytz	2022.7	pyzmq	23.2.0
requests	2.28.1	rope	1.7.0	s3transfer	0.6.1
scikit-learn	1.1.1	seaborn	0.12.2	SecretStorage	3.3.1
Send2Trash	1.8.0	setuptools	65.6.3	six	1.16.0
sniffio	1.2.0	soupsieve	2.3.2.post1	ssh-import-id	5.11
stack-data	0.2.0	statsmodels	0.13.5	tenacity	8.1.0
terminado	0.17.1	threadpoolctl	2.2.0	tinycss2	1.2.1
tokenize-rt	4.2.1	tomli	2.0.1	tornado	6.1
traitlets	5.7.1	typing_extensions	4.4.0	ujson	5.4.0
unattended-upgrades	0.1	urllib3	1.26.14	virtualenv	20.16.7
wadllib	1.3.6	wcwidth	0.2.5	webencodings	0.5.1
websocket-client	0.58.0	whatthepatch	1.0.2	wheel	0.38.4
widgetsnbextension	3.6.1	yapf	0.31.0	zipp	1.0.0

Installed R libraries

R libraries are installed from the Posit Package Manager CRAN snapshot on 2023-07-13.

Library	Version	Library	Version	Library	Version
arrow	12.0.1	askpass	1.1	assertthat	0.2.1
backports	1.4.1	base	4.3.1	base64enc	0.1-3
bit	4.0.5	bit64	4.0.5	blob	1.2.4
boot	1.3-28	brew	1.0-8	brio	1.1.3
broom	1.0.5	bslib	0.5.0	cachem	1.0.8
callr	3.7.3	caret	6.0-94	cellranger	1.1.0
chron	2.3-61	class	7.3-22	cli	3.6.1
clipr	0.8.0	clock	0.7.0	cluster	2.1.4
codetools	0.2-19	colorspace	2.1-0	commonmark	1.9.0
compiler	4.3.1	config	0.3.1	conflicted	1.2.0
cpp11	0.4.4	crayon	1.5.2	credentials	1.3.2
curl	5.0.1	data.table	1.14.8	datasets	4.3.1
DBI	1.1.3	dbplyr	2.3.3	desc	1.4.2
devtools	2.4.5	diagram	1.6.5	diffobj	0.3.5
digest	0.6.33	downlit	0.4.3	dplyr	1.1.2
dtplyr	1.3.1	e1071	1.7-13	ellipsis	0.3.2
evaluate	0.21	fansi	1.0.4	farver	2.1.1
fastmap	1.1.1	fontawesome	0.5.1	forcats	1.0.0
foreach	1.5.2	foreign	0.8-82	forge	0.2.0
fs	1.6.2	future	1.33.0	future.apply	1.11.0
gargle	1.5.1	generics	0.1.3	gert	1.9.2
ggplot2	3.4.2	gh	1.4.0	gitcreds	0.1.2
glmnet	4.1-7	globals	0.16.2	glue	1.6.2
googledrive	2.1.1	googlesheets4	1.1.1	gower	1.0.1
graphics	4.3.1	grDevices	4.3.1	grid	4.3.1
gridExtra	2.3	gsubfn	0.7	gtable	0.3.3
hardhat	1.3.0	haven	2.5.3	highr	0.10
hms	1.1.3	htmltools	0.5.5	htmlwidgets	1.6.2
httpuv	1.6.11	httr	1.4.6	httr2	0.2.3
ids	1.0.1	ini	0.3.1	ipred	0.9-14
isoband	0.2.7	iterators	1.0.14	jquerylib	0.1.4
jsonlite	1.8.7	KernSmooth	2.23-21	knitr	1.43
labeling	0.4.2	later	1.3.1	lattice	0.21-8
lava	1.7.2.1	lifecycle	1.0.3	listenv	0.9.0
lubridate	1.9.2	magrittr	2.0.3	markdown	1.7
MASS	7.3-60	Matrix	1.5-4.1	memoise	2.0.1
methods	4.3.1	mgcv	1.8-42	mime	0.12
miniUI	0.1.1.1	ModelMetrics	1.2.2.2	modelr	0.1.11
munsell	0.5.0	nlme	3.1-162	nnet	7.3-19
numDeriv	2016.8-1.1	openssl	2.0.6	parallel	4.3.1
parallelly	1.36.0	pillar	1.9.0	pkgbuild	1.4.2
pkgconfig	2.0.3	pkgdown	2.0.7	pkgload	1.3.2.1
plogr	0.2.0	plyr	1.8.8	praise	1.0.0
prettyunits	1.1.1	pROC	1.18.4	processx	3.8.2
prodlim	2023.03.31	profvis	0.3.8	progress	1.2.2
progressr	0.13.0	promises	1.2.0.1	proto	1.0.0
proxy	0.4-27	ps	1.7.5	purrr	1.0.1
r2d3	0.2.6	R6	2.5.1	ragg	1.2.5
randomForest	4.7-1.1	rappdirs	0.3.3	rcmdcheck	1.4.0
RColorBrewer	1.1-3	Rcpp	1.0.11	RcppEigen	0.3.3.9.3
readr	2.1.4	readxl	1.4.3	recipes	1.0.6
rematch	1.0.1	rematch2	2.1.2	remotes	2.4.2
reprex	2.0.2	reshape2	1.4.4	rlang	1.1.1
rmarkdown	2.23	RODBC	1.3-20	roxygen2	7.2.3
rpart	4.1.19	rprojroot	2.0.3	Rserve	1.8-11
RSQLite	2.3.1	rstudioapi	0.15.0	rversions	2.1.2
rvest	1.0.3	sass	0.4.6	scales	1.2.1
selectr	0.4-2	sessioninfo	1.2.2	shape	1.4.6
shiny	1.7.4.1	sourcetools	0.1.7-1	sparklyr	1.8.1
SparkR	3.5.0	spatial	7.3-15	splines	4.3.1
sqldf	0.4-11	SQUAREM	2021.1	stats	4.3.1
stats4	4.3.1	stringi	1.7.12	stringr	1.5.0
survival	3.5-5	sys	3.4.2	systemfonts	1.0.4
tcltk	4.3.1	testthat	3.1.10	textshaping	0.3.6
tibble	3.2.1	tidyr	1.3.0	tidyselect	1.2.0
tidyverse	2.0.0	timechange	0.2.0	timeDate	4022.108
tinytex	0.45	tools	4.3.1	tzdb	0.4.0
urlchecker	1.0.1	usethis	2.2.2	utf8	1.2.3
utils	4.3.1	uuid	1.1-0	vctrs	0.6.3
viridisLite	0.4.2	vroom	1.6.3	waldo	0.5.1
whisker	0.4.1	withr	2.5.0	xfun	0.39
xml2	1.3.5	xopen	1.0.0	xtable	1.8-4
yaml	2.3.7	zip	2.3.0

Installed Java and Scala libraries (Scala 2.12 cluster version)

Group ID	Artifact ID	Version
antlr	antlr	2.7.7
com.amazonaws	amazon-kinesis-client	1.12.0
com.amazonaws	aws-java-sdk-autoscaling	1.12.390
com.amazonaws	aws-java-sdk-cloudformation	1.12.390
com.amazonaws	aws-java-sdk-cloudfront	1.12.390
com.amazonaws	aws-java-sdk-cloudhsm	1.12.390
com.amazonaws	aws-java-sdk-cloudsearch	1.12.390
com.amazonaws	aws-java-sdk-cloudtrail	1.12.390
com.amazonaws	aws-java-sdk-cloudwatch	1.12.390
com.amazonaws	aws-java-sdk-cloudwatchmetrics	1.12.390
com.amazonaws	aws-java-sdk-codedeploy	1.12.390
com.amazonaws	aws-java-sdk-cognitoidentity	1.12.390
com.amazonaws	aws-java-sdk-cognitosync	1.12.390
com.amazonaws	aws-java-sdk-config	1.12.390
com.amazonaws	aws-java-sdk-core	1.12.390
com.amazonaws	aws-java-sdk-datapipeline	1.12.390
com.amazonaws	aws-java-sdk-directconnect	1.12.390
com.amazonaws	aws-java-sdk-directory	1.12.390
com.amazonaws	aws-java-sdk-dynamodb	1.12.390
com.amazonaws	aws-java-sdk-ec2	1.12.390
com.amazonaws	aws-java-sdk-ecs	1.12.390
com.amazonaws	aws-java-sdk-efs	1.12.390
com.amazonaws	aws-java-sdk-elasticache	1.12.390
com.amazonaws	aws-java-sdk-elasticbeanstalk	1.12.390
com.amazonaws	aws-java-sdk-elasticloadbalancing	1.12.390
com.amazonaws	aws-java-sdk-elastictranscoder	1.12.390
com.amazonaws	aws-java-sdk-emr	1.12.390
com.amazonaws	aws-java-sdk-glacier	1.12.390
com.amazonaws	aws-java-sdk-glue	1.12.390
com.amazonaws	aws-java-sdk-iam	1.12.390
com.amazonaws	aws-java-sdk-importexport	1.12.390
com.amazonaws	aws-java-sdk-kinesis	1.12.390
com.amazonaws	aws-java-sdk-kms	1.12.390
com.amazonaws	aws-java-sdk-lambda	1.12.390
com.amazonaws	aws-java-sdk-logs	1.12.390
com.amazonaws	aws-java-sdk-machinelearning	1.12.390
com.amazonaws	aws-java-sdk-opsworks	1.12.390
com.amazonaws	aws-java-sdk-rds	1.12.390
com.amazonaws	aws-java-sdk-redshift	1.12.390
com.amazonaws	aws-java-sdk-route53	1.12.390
com.amazonaws	aws-java-sdk-s3	1.12.390
com.amazonaws	aws-java-sdk-ses	1.12.390
com.amazonaws	aws-java-sdk-simpledb	1.12.390
com.amazonaws	aws-java-sdk-simpleworkflow	1.12.390
com.amazonaws	aws-java-sdk-sns	1.12.390
com.amazonaws	aws-java-sdk-sqs	1.12.390
com.amazonaws	aws-java-sdk-ssm	1.12.390
com.amazonaws	aws-java-sdk-storagegateway	1.12.390
com.amazonaws	aws-java-sdk-sts	1.12.390
com.amazonaws	aws-java-sdk-support	1.12.390
com.amazonaws	aws-java-sdk-swf-libraries	1.11.22
com.amazonaws	aws-java-sdk-workspaces	1.12.390
com.amazonaws	jmespath-java	1.12.390
com.clearspring.analytics	stream	2.9.6
com.databricks	Rserve	1.8-3
com.databricks	databricks-sdk-java	0.2.0
com.databricks	jets3t	0.7.1-0
com.databricks.scalapb	compilerplugin_2.12	0.4.15-10
com.databricks.scalapb	scalapb-runtime_2.12	0.4.15-10
com.esotericsoftware	kryo-shaded	4.0.2
com.esotericsoftware	minlog	1.3.0
com.fasterxml	classmate	1.3.4
com.fasterxml.jackson.core	jackson-annotations	2.15.2
com.fasterxml.jackson.core	jackson-core	2.15.2
com.fasterxml.jackson.core	jackson-databind	2.15.2
com.fasterxml.jackson.dataformat	jackson-dataformat-cbor	2.15.2
com.fasterxml.jackson.datatype	jackson-datatype-joda	2.15.2
com.fasterxml.jackson.datatype	jackson-datatype-jsr310	2.15.1
com.fasterxml.jackson.module	jackson-module-paranamer	2.15.2
com.fasterxml.jackson.module	jackson-module-scala_2.12	2.15.2
com.github.ben-manes.caffeine	caffeine	2.9.3
com.github.fommil	jniloader	1.1
com.github.fommil.netlib	native_ref-java	1.1
com.github.fommil.netlib	native_ref-java	1.1-natives
com.github.fommil.netlib	native_system-java	1.1
com.github.fommil.netlib	native_system-java	1.1-natives
com.github.fommil.netlib	netlib-native_ref-linux-x86_64	1.1-natives
com.github.fommil.netlib	netlib-native_system-linux-x86_64	1.1-natives
com.github.luben	zstd-jni	1.5.5-4
com.github.wendykierp	JTransforms	3.1
com.google.code.findbugs	jsr305	3.0.0
com.google.code.gson	gson	2.10.1
com.google.crypto.tink	tink	1.9.0
com.google.errorprone	error_prone_annotations	2.10.0
com.google.flatbuffers	flatbuffers-java	1.12.0
com.google.guava	guava	15.0
com.google.protobuf	protobuf-java	2.6.1
com.helger	profiler	1.1.1
com.jcraft	jsch	0.1.55
com.jolbox	bonecp	0.8.0.RELEASE
com.lihaoyi	sourcecode_2.12	0.1.9
com.microsoft.azure	azure-data-lake-store-sdk	2.3.9
com.microsoft.sqlserver	mssql-jdbc	11.2.2.jre8
com.ning	compress-lzf	1.1.2
com.sun.mail	javax.mail	1.5.2
com.sun.xml.bind	jaxb-core	2.2.11
com.sun.xml.bind	jaxb-impl	2.2.11
com.tdunning	json	1.8
com.thoughtworks.paranamer	paranamer	2.8
com.trueaccord.lenses	lenses_2.12	0.4.12
com.twitter	chill-java	0.10.0
com.twitter	chill_2.12	0.10.0
com.twitter	util-app_2.12	7.1.0
com.twitter	util-core_2.12	7.1.0
com.twitter	util-function_2.12	7.1.0
com.twitter	util-jvm_2.12	7.1.0
com.twitter	util-lint_2.12	7.1.0
com.twitter	util-registry_2.12	7.1.0
com.twitter	util-stats_2.12	7.1.0
com.typesafe	config	1.2.1
com.typesafe.scala-logging	scala-logging_2.12	3.7.2
com.uber	h3	3.7.0
com.univocity	univocity-parsers	2.9.1
com.zaxxer	HikariCP	4.0.3
commons-cli	commons-cli	1.5.0
commons-codec	commons-codec	1.16.0
commons-collections	commons-collections	3.2.2
commons-dbcp	commons-dbcp	1.4
commons-fileupload	commons-fileupload	1.5
commons-httpclient	commons-httpclient	3.1
commons-io	commons-io	2.13.0
commons-lang	commons-lang	2.6
commons-logging	commons-logging	1.1.3
commons-pool	commons-pool	1.5.4
dev.ludovic.netlib	arpack	3.0.3
dev.ludovic.netlib	blas	3.0.3
dev.ludovic.netlib	lapack	3.0.3
info.ganglia.gmetric4j	gmetric4j	1.0.10
io.airlift	aircompressor	0.24
io.delta	delta-sharing-spark_2.12	0.7.1
io.dropwizard.metrics	metrics-annotation	4.2.19
io.dropwizard.metrics	metrics-core	4.2.19
io.dropwizard.metrics	metrics-graphite	4.2.19
io.dropwizard.metrics	metrics-healthchecks	4.2.19
io.dropwizard.metrics	metrics-jetty9	4.2.19
io.dropwizard.metrics	metrics-jmx	4.2.19
io.dropwizard.metrics	metrics-json	4.2.19
io.dropwizard.metrics	metrics-jvm	4.2.19
io.dropwizard.metrics	metrics-servlets	4.2.19
io.netty	netty-all	4.1.93.Final
io.netty	netty-buffer	4.1.93.Final
io.netty	netty-codec	4.1.93.Final
io.netty	netty-codec-http	4.1.93.Final
io.netty	netty-codec-http2	4.1.93.Final
io.netty	netty-codec-socks	4.1.93.Final
io.netty	netty-common	4.1.93.Final
io.netty	netty-handler	4.1.93.Final
io.netty	netty-handler-proxy	4.1.93.Final
io.netty	netty-resolver	4.1.93.Final
io.netty	netty-transport	4.1.93.Final
io.netty	netty-transport-classes-epoll	4.1.93.Final
io.netty	netty-transport-classes-kqueue	4.1.93.Final
io.netty	netty-transport-native-epoll	4.1.93.Final
io.netty	netty-transport-native-epoll	4.1.93.Final-linux-aarch_64
io.netty	netty-transport-native-epoll	4.1.93.Final-linux-x86_64
io.netty	netty-transport-native-kqueue	4.1.93.Final-osx-aarch_64
io.netty	netty-transport-native-kqueue	4.1.93.Final-osx-x86_64
io.netty	netty-transport-native-unix-common	4.1.93.Final
io.prometheus	simpleclient	0.7.0
io.prometheus	simpleclient_common	0.7.0
io.prometheus	simpleclient_dropwizard	0.7.0
io.prometheus	simpleclient_pushgateway	0.7.0
io.prometheus	simpleclient_servlet	0.7.0
io.prometheus.jmx	collector	0.12.0
jakarta.annotation	jakarta.annotation-api	1.3.5
jakarta.servlet	jakarta.servlet-api	4.0.3
jakarta.validation	jakarta.validation-api	2.0.2
jakarta.ws.rs	jakarta.ws.rs-api	2.1.6
javax.activation	activation	1.1.1
javax.el	javax.el-api	2.2.4
javax.jdo	jdo-api	3.0.1
javax.transaction	jta	1.1
javax.transaction	transaction-api	1.1
javax.xml.bind	jaxb-api	2.2.11
javolution	javolution	5.5.1
jline	jline	2.14.6
joda-time	joda-time	2.12.1
net.java.dev.jna	jna	5.8.0
net.razorvine	pickle	1.3
net.sf.jpam	jpam	1.1
net.sf.opencsv	opencsv	2.3
net.sf.supercsv	super-csv	2.2.0
net.snowflake	snowflake-ingest-sdk	0.9.6
net.snowflake	snowflake-jdbc	3.13.29
net.sourceforge.f2j	arpack_combined_all	0.1
org.acplt.remotetea	remotetea-oncrpc	1.1.2
org.antlr	ST4	4.0.4
org.antlr	antlr-runtime	3.5.2
org.antlr	antlr4-runtime	4.9.3
org.antlr	stringtemplate	3.2.1
org.apache.ant	ant	1.9.16
org.apache.ant	ant-jsch	1.9.16
org.apache.ant	ant-launcher	1.9.16
org.apache.arrow	arrow-format	12.0.1
org.apache.arrow	arrow-memory-core	12.0.1
org.apache.arrow	arrow-memory-netty	12.0.1
org.apache.arrow	arrow-vector	12.0.1
org.apache.avro	avro	1.11.2
org.apache.avro	avro-ipc	1.11.2
org.apache.avro	avro-mapred	1.11.2
org.apache.commons	commons-collections4	4.4
org.apache.commons	commons-compress	1.23.0
org.apache.commons	commons-crypto	1.1.0
org.apache.commons	commons-lang3	3.12.0
org.apache.commons	commons-math3	3.6.1
org.apache.commons	commons-text	1.10.0
org.apache.curator	curator-client	2.13.0
org.apache.curator	curator-framework	2.13.0
org.apache.curator	curator-recipes	2.13.0
org.apache.datasketches	datasketches-java	3.1.0
org.apache.datasketches	datasketches-memory	2.0.0
org.apache.derby	derby	10.14.2.0
org.apache.hadoop	hadoop-client-runtime	3.3.6
org.apache.hive	hive-beeline	2.3.9
org.apache.hive	hive-cli	2.3.9
org.apache.hive	hive-jdbc	2.3.9
org.apache.hive	hive-llap-client	2.3.9
org.apache.hive	hive-llap-common	2.3.9
org.apache.hive	hive-serde	2.3.9
org.apache.hive	hive-shims	2.3.9
org.apache.hive	hive-storage-api	2.8.1
org.apache.hive.shims	hive-shims-0.23	2.3.9
org.apache.hive.shims	hive-shims-common	2.3.9
org.apache.hive.shims	hive-shims-scheduler	2.3.9
org.apache.httpcomponents	httpclient	4.5.14
org.apache.httpcomponents	httpcore	4.4.16
org.apache.ivy	ivy	2.5.1
org.apache.logging.log4j	log4j-1.2-api	2.20.0
org.apache.logging.log4j	log4j-api	2.20.0
org.apache.logging.log4j	log4j-core	2.20.0
org.apache.logging.log4j	log4j-slf4j2-impl	2.20.0
org.apache.mesos	mesos	1.11.0-shaded-protobuf
org.apache.orc	orc-core	1.9.0-shaded-protobuf
org.apache.orc	orc-mapreduce	1.9.0-shaded-protobuf
org.apache.orc	orc-shims	1.9.0
org.apache.thrift	libfb303	0.9.3
org.apache.thrift	libthrift	0.12.0
org.apache.xbean	xbean-asm9-shaded	4.23
org.apache.yetus	audience-annotations	0.13.0
org.apache.zookeeper	zookeeper	3.6.3
org.apache.zookeeper	zookeeper-jute	3.6.3
org.checkerframework	checker-qual	3.31.0
org.codehaus.jackson	jackson-core-asl	1.9.13
org.codehaus.jackson	jackson-mapper-asl	1.9.13
org.codehaus.janino	commons-compiler	3.0.16
org.codehaus.janino	janino	3.0.16
org.datanucleus	datanucleus-api-jdo	4.2.4
org.datanucleus	datanucleus-core	4.1.17
org.datanucleus	datanucleus-rdbms	4.1.19
org.datanucleus	javax.jdo	3.2.0-m3
org.eclipse.jetty	jetty-client	9.4.51.v20230217
org.eclipse.jetty	jetty-continuation	9.4.51.v20230217
org.eclipse.jetty	jetty-http	9.4.51.v20230217
org.eclipse.jetty	jetty-io	9.4.51.v20230217
org.eclipse.jetty	jetty-jndi	9.4.51.v20230217
org.eclipse.jetty	jetty-plus	9.4.51.v20230217
org.eclipse.jetty	jetty-proxy	9.4.51.v20230217
org.eclipse.jetty	jetty-security	9.4.51.v20230217
org.eclipse.jetty	jetty-server	9.4.51.v20230217
org.eclipse.jetty	jetty-servlet	9.4.51.v20230217
org.eclipse.jetty	jetty-servlets	9.4.51.v20230217
org.eclipse.jetty	jetty-util	9.4.51.v20230217
org.eclipse.jetty	jetty-util-ajax	9.4.51.v20230217
org.eclipse.jetty	jetty-webapp	9.4.51.v20230217
org.eclipse.jetty	jetty-xml	9.4.51.v20230217
org.eclipse.jetty.websocket	websocket-api	9.4.51.v20230217
org.eclipse.jetty.websocket	websocket-client	9.4.51.v20230217
org.eclipse.jetty.websocket	websocket-common	9.4.51.v20230217
org.eclipse.jetty.websocket	websocket-server	9.4.51.v20230217
org.eclipse.jetty.websocket	websocket-servlet	9.4.51.v20230217
org.fusesource.leveldbjni	leveldbjni-all	1.8
org.glassfish.hk2	hk2-api	2.6.1
org.glassfish.hk2	hk2-locator	2.6.1
org.glassfish.hk2	hk2-utils	2.6.1
org.glassfish.hk2	osgi-resource-locator	1.0.3
org.glassfish.hk2.external	aopalliance-repackaged	2.6.1
org.glassfish.hk2.external	jakarta.inject	2.6.1
org.glassfish.jersey.containers	jersey-container-servlet	2.40
org.glassfish.jersey.containers	jersey-container-servlet-core	2.40
org.glassfish.jersey.core	jersey-client	2.40
org.glassfish.jersey.core	jersey-common	2.40
org.glassfish.jersey.core	jersey-server	2.40
org.glassfish.jersey.inject	jersey-hk2	2.40
org.hibernate.validator	hibernate-validator	6.1.7.Final
org.ini4j	ini4j	0.5.4
org.javassist	javassist	3.29.2-GA
org.jboss.logging	jboss-logging	3.3.2.Final
org.jdbi	jdbi	2.63.1
org.jetbrains	annotations	17.0.0
org.joda	joda-convert	1.7
org.jodd	jodd-core	3.5.2
org.json4s	json4s-ast_2.12	3.7.0-M11
org.json4s	json4s-core_2.12	3.7.0-M11
org.json4s	json4s-jackson_2.12	3.7.0-M11
org.json4s	json4s-scalap_2.12	3.7.0-M11
org.lz4	lz4-java	1.8.0
org.mariadb.jdbc	mariadb-java-client	2.7.9
org.mlflow	mlflow-spark	2.2.0
org.objenesis	objenesis	2.5.1
org.postgresql	postgresql	42.6.0
org.roaringbitmap	RoaringBitmap	0.9.45
org.roaringbitmap	shims	0.9.45
org.rocksdb	rocksdbjni	8.3.2
org.rosuda.REngine	REngine	2.1.0
org.scala-lang	scala-compiler_2.12	2.12.15
org.scala-lang	scala-library_2.12	2.12.15
org.scala-lang	scala-reflect_2.12	2.12.15
org.scala-lang.modules	scala-collection-compat_2.12	2.9.0
org.scala-lang.modules	scala-parser-combinators_2.12	1.1.2
org.scala-lang.modules	scala-xml_2.12	1.2.0
org.scala-sbt	test-interface	1.0
org.scalacheck	scalacheck_2.12	1.14.2
org.scalactic	scalactic_2.12	3.2.15
org.scalanlp	breeze-macros_2.12	2.1.0
org.scalanlp	breeze_2.12	2.1.0
org.scalatest	scalatest-compatible	3.2.15
org.scalatest	scalatest-core_2.12	3.2.15
org.scalatest	scalatest-diagrams_2.12	3.2.15
org.scalatest	scalatest-featurespec_2.12	3.2.15
org.scalatest	scalatest-flatspec_2.12	3.2.15
org.scalatest	scalatest-freespec_2.12	3.2.15
org.scalatest	scalatest-funspec_2.12	3.2.15
org.scalatest	scalatest-funsuite_2.12	3.2.15
org.scalatest	scalatest-matchers-core_2.12	3.2.15
org.scalatest	scalatest-mustmatchers_2.12	3.2.15
org.scalatest	scalatest-propspec_2.12	3.2.15
org.scalatest	scalatest-refspec_2.12	3.2.15
org.scalatest	scalatest-shouldmatchers_2.12	3.2.15
org.scalatest	scalatest-wordspec_2.12	3.2.15
org.scalatest	scalatest_2.12	3.2.15
org.slf4j	jcl-over-slf4j	2.0.7
org.slf4j	jul-to-slf4j	2.0.7
org.slf4j	slf4j-api	2.0.7
org.threeten	threeten-extra	1.7.1
org.tukaani	xz	1.9
org.typelevel	algebra_2.12	2.0.1
org.typelevel	cats-kernel_2.12	2.1.1
org.typelevel	spire-macros_2.12	0.17.0
org.typelevel	spire-platform_2.12	0.17.0
org.typelevel	spire-util_2.12	0.17.0
org.typelevel	spire_2.12	0.17.0
org.wildfly.openssl	wildfly-openssl	1.1.3.Final
org.xerial	sqlite-jdbc	3.42.0.0
org.xerial.snappy	snappy-java	1.1.10.3
org.yaml	snakeyaml	2.0
oro	oro	2.0.8
pl.edu.icm	JLargeArrays	1.5
software.amazon.cryptools	AmazonCorrettoCryptoProvider	1.6.1-linux-x86_64
software.amazon.ion	ion-java	1.0.2
stax	stax-api	1.0.1

Share via

Databricks Runtime 14.0 (EoS)

New features and improvements

Row tracking is GA

Predictive I/O for updates is GA

Deletion vectors are GA

Spark 3.5.0 is GA

Public preview for user-defined table functions for Python

Public preview for row-level concurrency

Default current working directory has changed

Known issue with sparklyr

Introducing Spark Connect in shared cluster architecture

List available Spark versions API update

Breaking changes

Python on clusters with shared access mode

Delta on clusters with shared access mode

SQL on clusters with shared access mode

Library upgrades

Apache Spark

Highlights

Spark Connect

Spark SQL

Features

Functions

Data Sources

Query Optimization

Code Generation and Query Execution

Other Notable Changes

PySpark

Features

Other Notable Changes

Core

Structured Streaming

ML

UI

Build and Others

Removals, Behavior Changes and Deprecations

Upcoming Removal

Migration Guides

Databricks ODBC/JDBC driver support

System environment

Installed Python libraries

Installed R libraries

Installed Java and Scala libraries (Scala 2.12 cluster version)

Feedback

Additional resources