Best practices for serverless compute

Follow these recommendations to maximize productivity, reduce costs, and improve reliability when using serverless compute for notebooks, jobs, and pipelines on Azure Databricks.

Migrate workloads to serverless compute

For step-by-step instructions on migrating from classic compute to serverless, including prerequisites, required code changes, testing strategies, and a phased rollout plan, see Migrate from classic compute to serverless compute.

Specify Python package versions

When migrating to serverless compute, pin your Python packages to specific versions to ensure reproducible environments. If you don't specify a version, the package may resolve to a different version based on the serverless environment version, which can increase latency as new packages need to be installed.

For example, your requirements.txt file should include specific package versions, like this:

numpy==2.2.2
pandas==2.2.3

Use unique names for temporary views

Serverless compute uses Spark Connect, a client-server architecture that evaluates temporary views lazily. This behavior differs from the classic Spark architecture and can cause errors when code reuses the same temporary view name, such as in a loop.

To avoid errors, use unique names for all temporary views in your code.

Networking and connectivity

Serverless compute does not support VPC peering, which is a common way to connect classic Databricks compute to data sources in your cloud account.

Instead, if you require private connectivity from serverless you can create private endpoints from serverless to resources in your VPC.

If private connectivity is not possible or required, you can still protect your resources by creating a firewall around them that allows Databricks serverless traffic.

For example, to enable connectivity from serverless compute, you can add the Azure Databricks outbound IP set to the allowlist on external VPCs. Azure Databricks IPs are subject to change, so you must create automation to keep your allowlists current rather than using a one-time copy. For more details, see Configure an Azure network security perimeter (NSP) for Azure resources.

To connect to enterprise applications (such as Salesforce) or managed databases (such as MySQL), use Lakeflow Connect.

To restrict and monitor outbound traffic from serverless compute, configure egress controls for your workspace. See Manage network policies for serverless egress control.

Serverless environment versions

Serverless compute uses environment versions instead of traditional Databricks Runtime versions. This represents a shift in how you manage workload compatibility:

Databricks Runtime approach: You select a specific Databricks Runtime version for your workload and manage upgrades manually to maintain compatibility.
Serverless approach: You write code against an environment version, and Azure Databricks independently upgrades the underlying server.

Environment versions provide a stable client API that ensures your workload remains compatible while Azure Databricks independently delivers performance improvements, security enhancements, and bug fixes without requiring code changes to your workloads.

Each environment version includes updated system libraries, features, and bug fixes, while maintaining backward compatibility for workloads. Azure Databricks supports each environment version for three years from its release date, providing a predictable lifecycle for planning upgrades.

To select a base environment for your serverless workload, see Select a base environment. For details about available environment versions and their features, see Serverless environment versions.

Manage dependencies

Serverless compute does not support init scripts. Instead, use serverless environments to install and manage libraries for your serverless workloads. Environments cache installed packages, which reduces startup latency for subsequent runs.

To use libraries from a private repository, configure pre-signed URLs for authenticated repository access in your environment settings.

Choose a performance mode

Azure Databricks serverless compute offers two performance modes that let you balance speed and cost based on your workload type as follows:

Performance-optimized mode (default): Best for interactive workloads that require fast startup times. Azure Databricks keeps a pool of warm compute resources ready to minimize wait time.
Standard mode: Best for automated batch jobs and pipelines that can tolerate longer startup times of 4 to 6 minutes. Standard mode can reduce costs by up to 70% compared to performance-optimized mode. Standard mode is available for Lakeflow Jobs and Lakeflow pipelines, but not for notebooks.

Choose the mode that best matches your workload requirements. For scheduled jobs where startup latency is not critical, Standard mode typically offers the best value. For current pricing details, see the Databricks pricing page.

Optimize streaming workloads

Serverless compute supports structured streaming with Trigger.AvailableNow. Time-based trigger intervals are not supported. For details on supported triggers, code examples, and alternatives for continuous streaming, see the streaming section of the migration guide.

When using Trigger.AvailableNow, each trigger processes all available data in the source, which can result in larger micro-batches than a time-based trigger would. To prevent out-of-memory errors and maintain predictable performance, limit the amount of data processed per micro-batch by setting maxFilesPerTrigger or maxBytesPerTrigger.

For a decision guide that maps streaming use cases to the right serverless product, see Streaming on serverless compute.

Debug serverless workloads

The Spark UI is not available in serverless compute. Instead, use the query profile to analyze query performance and troubleshoot workloads. The query profile provides detailed execution information and is accessible from the query history in the Azure Databricks UI.

Ingesting data from external systems

Alternative strategies you can use for ingestion include:

SQL-based building blocks like COPY INTO and streaming tables.

Auto Loader to incrementally and efficiently processes new data files as they arrive in cloud storage. See What is Auto Loader?.
Data ingestion partner solutions. See Connect to ingestion partners using Partner Connect.
The add data UI to directly upload files. See Create or modify a table using file upload.

Ingestion alternatives

When using serverless compute, you can also use the following features to query your data without moving it.

If you want to limit data duplication or guarantee that you are querying the freshest possible data, Databricks recommends using OpenSharing. See What is OpenSharing?.
For ad hoc reporting and proof-of-concept work, Lakehouse Federation enables you to query external databases directly from Azure Databricks without moving data, governed by Unity Catalog. See Connect to external databases and catalogs.

Try one or both of these features and see whether they satisfy your query performance requirements.

Unsupported sinks

If a sink system is not supported as a direct write target from serverless compute, you can use the Unity Catalog Iceberg REST Catalog to enable that system to read directly from Azure Databricks tables. For example, Snowflake is not a supported serverless sink, but it can be configured as an Iceberg client to read tables managed by Unity Catalog.

This approach avoids duplicating data and keeps Unity Catalog as the governance layer for all reads. For supported clients and configuration steps, see Access Azure Databricks tables from Apache Iceberg clients.

Supported Spark configurations

To automate the configuration of Spark on serverless compute, Azure Databricks has removed support for manually setting most Spark configurations. To view a list of supported Spark configuration parameters, see Configure Spark properties for serverless notebooks and jobs.

Job runs on serverless compute will fail if you set an unsupported Spark configuration.

Monitor the cost of serverless compute

There are multiple features you can use to help you monitor the cost of serverless compute:

Use serverless usage policies to attribute your serverless compute usage.
Use system tables to create dashboards, set up alerts, and perform ad hoc queries. See Monitor the cost of serverless compute.
Set up budget alerts in your account. See Create and monitor budgets.

Import a pre-configured usage dashboard. See Import a usage dashboard.

Feedback

Was this page helpful?

Last updated on 2026-07-10