Serverless compute release notes
This article explains the features and behaviors that are currently available and upcoming on serverless compute for notebooks and jobs.
For more information on serverless compute, see Connect to serverless compute.
Databricks periodically releases updates to serverless compute, automatically upgrading the serverless compute runtime to support enhancements and upgrades to the platform. All users get the same updates, rolled out over a short period of time.
Release notes
This section includes release notes for serverless compute. Release notes are organized by year and week of year. Serverless compute always runs using the most recently released version listed here.
Version 2024.43
October 28, 2024
This serverless compute release roughly corresponds to Databricks Runtime 15.4
New features
- UTF-8 validation functions: This release introduces the following functions for validating UTF-8 strings:
- is_valid_utf8 verified whether a string is a valid UTF-8 string.
- make_valid_utf8 converts a potentially invalid UTF-8 string to a valid UTF-8 string using substitution characters.
- validate_utf8 raises an error if the input is not a valid UTF-8 string.
- try_validate_utf8 returns
NULL
if the input is not a valid UTF-8 string.
- Enable UniForm Iceberg using ALTER TABLE: You can now enable UniForm Iceberg on existing tables without rewriting data files. See Enable by altering an existing table.
- try_url_decode function: This release introduces the try_url_decode function, which decodes a URL-encoded string. If the string is not in the correct format, the function returns
NULL
instead of raising an error. - Optionally allow the optimizer to rely on unenforced foreign key constraints: To improve query performance, you can now specify the
RELY
keyword onFOREIGN KEY
constraints when you CREATE or ALTER a table. - Parallelized job runs for selective overwrites: Selective overwrites using
replaceWhere
now run jobs that delete data and insert new data in parallel, improving query performance and cluster utilization. - Improved performance for change data feed with selective overwrites: Selective overwrites using
replaceWhere
on tables with change data feed no longer write separate change data files for inserted data. These operations use a hidden_change_type
column present in the underlying Parquet data files to record changes without write amplification. - Improved query latency for the
COPY INTO
command: This release includes a change that improves the query latency for theCOPY INTO
command. This improvement is implemented by making the loading of state by the RocksDB state store asynchronous. With this change, you should see an improvement in start times for queries with large states, such as queries with a large number of already ingested files. - Support for dropping the check constraints table feature: You can now drop the
checkConstraints
table feature from a Delta table usingALTER TABLE table_name DROP FEATURE checkConstraints
. See Disable check constraints.
Behavior changes
Schema binding change for views: When the data types in a view’s underlying query change from those used when the view was first created, Databricks no longer throws errors for references to the view when no safe cast can be performed.
Instead, the view compensates by using regular casting rules where possible. This change allows Databricks to tolerate table schema changes more readily.
Disallow undocumented
!
syntax toleration forNOT
outside boolean logic: Databricks will no longer tolerate the use of!
as a synonym forNOT
outside of boolean logic. This change reduces confusion, aligns with the SQL standard, and makes SQL more portable. For example:CREATE ... IF ! EXISTS
, IS ! NULL,! NULL
column or field property,! IN
and ! BETWEEN must be replaced with:CREATE ... IF NOT EXISTS
,IS NOT NULL
,NOT NULL
column or field property,NOT IN
andNOT BETWEEN
.The boolean prefix operator
!
(e.g.!is_mgr
or!(true AND false)
) is unaffected by this change.Disallow undocumented and unprocessed portions of column definition syntax in views: Databricks supports CREATE VIEW with named columns and column comments.
The specification of column types,
NOT NULL
constraints, orDEFAULT
has been tolerated in the syntax without having any effect. Databricks will remove this syntax toleration. Doing so reduces confusion, aligns with the SQL standard, and allows for future enhancements.Consistent error handling for Base64 decoding in Spark and Photon: This release changes how Photon handles Base64 decoding errors to match the Spark handling of these errors. Before these changes, the Photon and Spark code generation path sometimes failed to raise parsing exceptions, while the Spark interpreted execution correctly raised
IllegalArgumentException
orConversionInvalidInputError
. This update ensures that Photon consistently raises the same exceptions as Spark during Base64 decoding errors, providing more predictable and reliable error handling.Adding a
CHECK
constraint on an invalid column now returns the UNRESOLVED_COLUMN.WITH_SUGGESTION error class: To provide more useful error messaging, in Databricks Runtime 15.3 and above, anALTER TABLE ADD CONSTRAINT
statement that includes aCHECK
constraint referencing an invalid column name returns the UNRESOLVED_COLUMN.WITH_SUGGESTION error class. Previously, anINTERNAL_ERROR
was returned.
The JDK is upgraded from JDK 8 to JDK 17
August 15, 2024
Serverless compute for notebooks and workflows has migrated from Java Development Kit (JDK) 8 to JDK 17 on the server side. This upgrade includes the following behavioral changes:
Bug fixes
Correct parsing of regex patterns with negation in nested character grouping: With this upgrade, Azure Databricks now supports the correct parsing of regex patterns with negation in nested character grouping. For example, [^[abc]]
will be parsed as “any character that is NOT one of ‘abc’”.
Additionally, Photon behavior was inconsistent with Spark for nested character classes. Regex patterns containing nested character classes will no longer use Photon, and instead will use Spark. A nested character class is any pattern containing square brackets within square brackets, such as [[a-c][1-3]]
.
Version 2024.30
July 23, 2024
This serverless compute release roughly corresponds to Databricks Runtime 15.1
New features
Support for star (*
) syntax in the WHERE
clause: You can now use the star (*
) syntax in the WHERE
clause to reference all columns from the SELECT
list.
For example, SELECT * FROM VALUES(1, 2) AS T(a1, a2) WHERE 1 IN(T.*)
.
Behavior changes
Improved error recovery for JSON parsing: The JSON parser used for from_json()
and JSON path expressions now recovers faster from malformed syntax, resulting in less data loss.
When encountering malformed JSON syntax in a struct field, an array value, a map key, or a map value, the JSON parser will now return NULL
only for the unreadable field, key, or element. Subsequent fields, keys, or elements will be properly parsed. Prior to this change, the JSON parser abandoned parsing the array, struct, or map and returned NULL
for the remaining content.
Version 2024.15
April 15, 2024
This is the initial serverless compute version. This version roughly corresponds to Databricks Runtime 14.3 with some modifications that remove support for some non-serverless and legacy features.
Supported Spark configuration parameters
To automate the configuration of Spark on serverless compute, Databricks has removed support for manually setting most Spark configurations. You can manually set only the following Spark configuration parameters:
spark.sql.legacy.timeParserPolicy
(Default value isEXCEPTION
)spark.sql.session.timeZone
(Default value isEtc/UTC
)spark.sql.shuffle.partitions
(Default value isauto
)spark.sql.ansi.enabled
(Default value istrue
)
Job runs on serverless compute will fail if you set a Spark configuration that is not in this list.
For more on configuring Spark properties, see Set Spark configuration properties on Azure Databricks.
input_file functions are deprecated
The input_file_name(), input_file_block_length(), and input_file_block_start() functions have been deprecated. Using these functions is highly discouraged.
Instead, use the file metadata column to retrieve file metadata information.
Behavioral changes
Serverless compute version 2024.15 includes the following behavioral changes:
- unhex(hexStr) bug fix: When using the
unhex(hexStr)
function, hexStr is always padded left to a whole byte. Previously the unhex function ignored the first half-byte. For example:unhex('ABC')
now producesx'0ABC'
instead ofx'BC'
. - Auto-generated column aliases are now stable: When the result of an expression is referenced without a user-specified column alias, this auto-generated alias will now be stable. The new algorithm may result in a change to the previously auto-generated names used in features like materialized views.
- Table scans with
CHAR
type fields are now always padded: Delta tables, certain JDBC tables, and external data sources store CHAR data in non-padded form. When reading, Databricks will now pad the data with spaces to the declared length to ensure correct semantics. - Casts from BIGINT/DECIMAL to TIMESTAMP throw an exception for overflowed values: Databricks allows casting from BIGINT and DECIMAL to TIMESTAMP by treating the value as the number of seconds from the Unix epoch. Previously, Databricks would return overflowed values but now throws an exception in cases of overflow. Use
try_cast
to return NULL instead of an exception. - PySpark UDF execution has been improved to match the exact behavior of UDF execution on single user compute: The following changes have been made:
- UDFs with a string return type no longer implicitly convert non-string values into strings. Previously, UDFs with a return type of
str
would apply astr(..)
wrapper to the result regardless of the actual data type of the returned value. - UDFs with
timestamp
return types no longer implicitly apply a timezone conversion to timestamps.
- UDFs with a string return type no longer implicitly convert non-string values into strings. Previously, UDFs with a return type of
System environment
Serverless compute includes the following system environment:
- Operating System: Ubuntu 22.04.3 LTS
- Python: 3.10.12
- Delta Lake: 3.1.0
Installed Python libraries
The following Python libraries are installed on serverless compute by default. Additional dependencies can be installed using the Environment side panel. See Install notebook dependencies.
Library | Version | Library | Version | Library | Version |
---|---|---|---|---|---|
anyio | 3.5.0 | argon2-cffi | 21.3.0 | argon2-cffi-bindings | 21.2.0 |
asttokens | 2.0.5 | astunparse | 1.6.3 | attrs | 22.1.0 |
backcall | 0.2.0 | beautifulsoup4 | 4.11.1 | black | 22.6.0 |
bleach | 4.1.0 | blinker | 1.4 | boto3 | 1.24.28 |
botocore | 1.27.96 | cachetools | 5.3.2 | certifi | 2022.12.7 |
cffi | 1.15.1 | chardet | 4.0.0 | charset-normalizer | 2.0.4 |
click | 8.0.4 | comm | 0.1.2 | contourpy | 1.0.5 |
cryptography | 39.0.1 | cycler | 0.11.0 | Cython | 0.29.32 |
databricks-connect | 14.3.1 | databricks-sdk | 0.20.0 | dbus-python | 1.2.18 |
debugpy | 1.6.7 | decorator | 5.1.1 | defusedxml | 0.7.1 |
distlib | 0.3.8 | docstring-to-markdown | 0.11 | entrypoints | 0.4 |
executing | 0.8.3 | facets-overview | 1.1.1 | fastjsonschema | 2.19.1 |
filelock | 3.13.1 | fonttools | 4.25.0 | google-auth | 2.28.1 |
googleapis-common-protos | 1.62.0 | grpcio | 1.62.0 | grpcio-status | 1.62.0 |
httplib2 | 0.20.2 | idna | 3.4 | importlib-metadata | 4.6.4 |
ipyflow-core | 0.0.198 | ipykernel | 6.25.0 | ipython | 8.14.0 |
ipython-genutils | 0.2.0 | ipywidgets | 7.7.2 | jedi | 0.18.1 |
jeepney | 0.7.1 | Jinja2 | 3.1.2 | jmespath | 0.10.0 |
joblib | 1.2.0 | jsonschema | 4.17.3 | jupyter-client | 7.3.4 |
jupyter-server | 1.23.4 | jupyter_core | 5.2.0 | jupyterlab-pygments | 0.1.2 |
jupyterlab-widgets | 1.0.0 | keyring | 23.5.0 | kiwisolver | 1.4.4 |
launchpadlib | 1.10.16 | lazr.restfulclient | 0.14.4 | lazr.uri | 1.0.6 |
lxml | 4.9.1 | MarkupSafe | 2.1.1 | matplotlib | 3.7.0 |
matplotlib-inline | 0.1.6 | mccabe | 0.7.0 | mistune | 0.8.4 |
more-itertools | 8.10.0 | mypy-extensions | 0.4.3 | nbclassic | 0.5.2 |
nbclient | 0.5.13 | nbconvert | 6.5.4 | nbformat | 5.7.0 |
nest-asyncio | 1.5.6 | nodeenv | 1.8.0 | notebook | 6.5.2 |
notebook_shim | 0.2.2 | numpy | 1.23.5 | oauthlib | 3.2.0 |
packaging | 23.2 | pandas | 1.5.3 | pandocfilters | 1.5.0 |
parso | 0.8.3 | pathspec | 0.10.3 | patsy | 0.5.3 |
pexpect | 4.8.0 | pickleshare | 0.7.5 | Pillow | 9.4.0 |
pip | 22.3.1 | platformdirs | 2.5.2 | plotly | 5.9.0 |
pluggy | 1.0.0 | prometheus-client | 0.14.1 | prompt-toolkit | 3.0.36 |
protobuf | 4.25.3 | psutil | 5.9.0 | psycopg2 | 2.9.3 |
ptyprocess | 0.7.0 | pure-eval | 0.2.2 | py4j | 0.10.9.7 |
pyarrow | 8.0.0 | pyarrow-hotfix | 0.5 | pyasn1 | 0.5.1 |
pyasn1-modules | 0.3.0 | pyccolo | 0.0.52 | pycparser | 2.21 |
pydantic | 1.10.6 | pyflakes | 3.1.0 | Pygments | 2.11.2 |
PyGObject | 3.42.1 | PyJWT | 2.3.0 | pyodbc | 4.0.32 |
pyparsing | 3.0.9 | pyright | 1.1.294 | pyrsistent | 0.18.0 |
python-dateutil | 2.8.2 | python-lsp-jsonrpc | 1.1.1 | python-lsp-server | 1.8.0 |
pytoolconfig | 1.2.5 | pytz | 2022.7 | pyzmq | 23.2.0 |
requests | 2.28.1 | rope | 1.7.0 | rsa | 4.9 |
s3transfer | 0.6.2 | scikit-learn | 1.1.1 | scipy | 1.10.0 |
seaborn | 0.12.2 | SecretStorage | 3.3.1 | Send2Trash | 1.8.0 |
setuptools | 65.6.3 | six | 1.16.0 | sniffio | 1.2.0 |
soupsieve | 2.3.2.post1 | ssh-import-id | 5.11 | stack-data | 0.2.0 |
statsmodels | 0.13.5 | tenacity | 8.1.0 | terminado | 0.17.1 |
threadpoolctl | 2.2.0 | tinycss2 | 1.2.1 | tokenize-rt | 4.2.1 |
tomli | 2.0.1 | tornado | 6.1 | traitlets | 5.7.1 |
typing_extensions | 4.4.0 | ujson | 5.4.0 | unattended-upgrades | 0.1 |
urllib3 | 1.26.14 | virtualenv | 20.16.7 | wadllib | 1.3.6 |
wcwidth | 0.2.5 | webencodings | 0.5.1 | websocket-client | 0.58.0 |
whatthepatch | 1.0.2 | wheel | 0.38.4 | widgetsnbextension | 3.6.1 |
yapf | 0.33.0 | Zipp | 1.0.0 |