Serverless compute limitations
This article explains the current limitations of serverless compute for notebooks and jobs. Starting with an overview of the most important considerations, followed by a comprehensive reference list of limitations.
Limitations overview
Before creating new workloads or migrating workloads to serverless compute, first consider the following limitations:
- Python and SQL are the only supported languages.
- Only Spark connect APIs are supported. Spark RDD APIs are not supported.
- JAR libraries are not supported. For workarounds, see Best practices for serverless compute.
- Serverless compute has unrestricted access for all workspace users.
- Notebook tags are not supported.
- For streaming, only incremental batch logic can be used. There is no support for default or time-based trigger intervals. See Streaming limitations.
Limitations reference list
The following sections list the current limitations of serverless compute.
Serverless compute is based on the shared compute architecture. The most relevant limitations inherited from shared compute are listed below, along with additional serverless-specific limitations. For a full list of shared compute limitations, see Compute access mode limitations for Unity Catalog.
General limitations
Scala and R are not supported.
ANSI SQL is the default when writing SQL. Opt-out of ANSI mode by setting
spark.sql.ansi.enabled
tofalse
.Spark RDD APIs are not supported.
Spark Context (sc),
spark.sparkContext
, andsqlContext
are not supported.Databricks Container Services are not supported.
The web terminal is not supported.
No query can run longer than 48 hours.
You must use Unity Catalog to connect to external data sources. Use external locations to access cloud storage.
Support for data sources is limited to AVRO, BINARYFILE, CSV, DELTA, JSON, KAFKA, ORC, PARQUET, ORC, TEXT, and XML.
User-defined functions (UDFs) cannot access the internet. Because of this, the CREATE FUNCTION (External) command is not supported. Databricks recommends using CREATE FUNCTION (SQL and Python) to create UDFs.
Individual rows must not exceed the maximum size of 128MB.
The Spark UI is not available. Instead, use the query profile to view information about your Spark queries. See Query profile.
Python clients that use Databricks endpoints may encounter SSL verification errors such as “CERTIFICATE_VERIFY_FAILED”. To work around these errors, configure the client to trust the CA file located in
/etc/ssl/certs/ca-certificates.crt
. For example, run the following command at the beginning of a serverless notebook or job:import os; os.environ['SSL_CERT_FILE'] = '/etc/ssl/certs/ca-certificates.crt'
Cross-workspace API requests are not supported.
Global temporary views are not supported. Databricks recommends using session temporary views or creating tables where cross-session data passing is required.
Streaming limitations
- There is no support for default or time-based trigger intervals. Only
Trigger.AvailableNow
is supported. See Configure Structured Streaming trigger intervals. - All limitations for streaming on shared access mode also apply. See Streaming limitations and requirements for Unity Catalog shared access mode.
Machine learning limitations
- Databricks Runtime for Machine Learning and Apache Spark MLlib are not supported.
- GPUs are not supported.
Notebooks limitations
- Notebooks have access to 8GB memory which cannot be configured.
- Notebook-scoped libraries are not cached across development sessions.
- Sharing TEMP tables and views when sharing a notebook among users is not supported.
- Autocomplete and Variable Explorer for dataframes in notebooks are not supported.
Workflow limitations
- The driver size for serverless compute for jobs is currently fixed and cannot be changed.
- Task logs are not isolated per task run. Logs will contain the output from multiple tasks.
- Task libraries are not supported for notebook tasks. Use notebook-scoped libraries instead. See Notebook-scoped Python libraries.
Compute-specific limitations
The following compute-specific features are not supported:
- Compute policies
- Compute-scoped init scripts
- Compute-scoped libraries, including custom data sources and Spark extensions. Use notebook-scoped libraries instead.
- Compute-level data access configurations, including instance profiles. As a consequence, accessing tables and files via HMS on cloud paths, or with DBFS mounts that have no embedded credentials, will not work.
- Instance pools
- Compute event logs
- Most Apache Spark compute configurations. For a list of supported configurations, see Supported Spark configuration parameters.
- Environment variables. Instead, Databricks recommends using widgets to create job and task parameters.
Caching limitations
Dataframe and SQL cache APIs are not supported on serverless compute. Using any of these APIs or SQL commands will result in an exception.
- df.cache(), df.persist()
- df.unpersist()
- spark.catalog.cacheTable()
- spark.catalog.uncacheTable()
- spark.catalog.clearCache()
- CACHE TABLE
- UNCACHE TABLE
- REFRESH TABLE
- CLEAR CACHE
Hive limitations
Hive SerDe tables are not supported. Additionally, the corresponding LOAD DATA command which loads data into a Hive SerDe table is not supported. Using the command will result in an exception.
Support for data sources is limited to AVRO, BINARYFILE, CSV, DELTA, JSON, KAFKA, ORC, PARQUET, ORC, TEXT, and XML.
Hive variables (for example
${env:var}
,${configName}
,${system:var}
, andspark.sql.variable
) or config variable references using the${var}
syntax are not supported. Using Hive variables will result in an exception.Instead, use DECLARE VARIABLE, SET VARIABLE, and SQL session variable references and parameter markers (‘?’, or ‘:var’) to declare, modify, and reference session state. You can also use the IDENTIFIER clause to parameterize object names in many cases.