Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Follow these recommendations to maximize productivity, reduce costs, and improve reliability when using serverless compute for notebooks, jobs, and pipelines on Azure Databricks.
Migrating workloads to serverless compute
To ensure the isolation of user code in the shared serverless compute environment, Azure Databricks utilizes Lakeguard to isolates user code from the Spark engine and from other users.
Because of this, some workloads require code changes to continue working on serverless compute. For a list of limitations, see Serverless compute limitations.
Certain workloads are easier to migrate than others. Workloads that meet the following requirements will be the easiest to migrate:
- The data being accessed must be stored in Unity Catalog.
- The workload should be compatible with standard compute.
- The workload should be compatible with Databricks Runtime 14.3 or above.
To test if a workload will work on serverless compute, run it on a classic compute resource with Standard access mode and a Databricks Runtime of 14.3 or above. If the run is successful, the workload is ready for migration.
Azure Databricks recommends prioritizing serverless compute compatibility when creating new workloads. For existing workloads that require code changes, migrate them incrementally as part of your regular development and maintenance cycle.
Specify Python package versions
When migrating to serverless compute, pin your Python packages to specific versions to ensure reproducible environments. If you don't specify a version, the package may resolve to a different version based on the serverless environment version, which can increase latency as new packages need to be installed.
For example, your requirements.txt file should include specific package versions, like this:
numpy==2.2.2
pandas==2.2.3
Use unique names for temporary views
Serverless compute uses Spark Connect, a client-server architecture that evaluates temporary views lazily. This behavior differs from the classic Spark architecture and can cause errors when code reuses the same temporary view name, such as in a loop.
To avoid errors, use unique names for all temporary views in your code.
Networking and connectivity
Serverless compute does not support VPC peering, which is a common way to connect classic Databricks compute to data sources in your cloud account. As an alternative, use network connectivity configurations to manage endpoints, firewalls, and connectivity to external services.
For example, you can add a set of stable egress IPs in external VPCs to an allowlist to enable connectivity to and from Azure Databricks serverless compute. To connect to enterprise applications (such as Salesforce) or managed databases (such as MySQL), use Lakeflow Connect.
To restrict and monitor outbound traffic from serverless compute, configure egress controls for your workspace. See Manage network policies for serverless egress control.
Serverless environment versions
Serverless compute uses environment versions instead of traditional Databricks Runtime versions. This represents a shift in how you manage workload compatibility:
- Databricks Runtime approach: You select a specific Databricks Runtime version for your workload and manage upgrades manually to maintain compatibility.
- Serverless approach: You write code against an environment version, and Azure Databricks independently upgrades the underlying server.
Environment versions provide a stable client API that ensures your workload remains compatible while Azure Databricks independently delivers performance improvements, security enhancements, and bug fixes without requiring code changes to your workloads.
Each environment version includes updated system libraries, features, and bug fixes, while maintaining backward compatibility for workloads. Azure Databricks supports each environment version for three years from its release date, providing a predictable lifecycle for planning upgrades.
To select an environment version for your serverless workload, see Select a base environment. For details about available environment versions and their features, see Serverless environment versions.
Manage dependencies
Serverless compute does not support init scripts. Instead, use serverless environments to install and manage libraries for your serverless workloads. Environments cache installed packages, which reduces startup latency for subsequent runs.
To use libraries from a private repository, configure pre-signed URLs for authenticated repository access in your environment settings.
Choose a performance mode
Azure Databricks serverless compute offers two performance modes that let you balance speed and cost based on your workload type as follows:
- Performance-optimized mode (default): Best for interactive workloads that require fast startup times. Azure Databricks keeps a pool of warm compute resources ready to minimize wait time.
- Standard mode: Best for automated batch jobs and pipelines that can tolerate longer startup times of 4 to 6 minutes. Standard mode can reduce costs by up to 70% compared to performance-optimized mode. Standard mode is available for Lakeflow Jobs and Lakeflow Spark Declarative Pipelines, but not for notebooks.
Choose the mode that best matches your workload requirements. For scheduled jobs where startup latency is not critical, Standard mode typically offers the best value. For current pricing details, see the Databricks pricing page.
Optimize streaming workloads
Serverless compute supports structured streaming with the following considerations:
The
Trigger.AvailableNowtrigger mode is supported for all serverless jobs and pipelines. Time-based trigger intervals are not supported.When using
Trigger.AvailableNow, each trigger processes all available data in the source, which can result in larger micro-batches than a time-based trigger would. To prevent out-of-memory errors and maintain predictable performance, limit the amount of data processed per micro-batch by settingmaxFilesPerTriggerormaxBytesPerTrigger.
Debug serverless workloads
The Spark UI is not available in serverless compute. Instead, use the query profile to analyze query performance and troubleshoot workloads. The query profile provides detailed execution information and is accessible from the query history in the Azure Databricks UI.
Ingesting data from external systems
Alternative strategies you can use for ingestion include:
- SQL-based building blocks like COPY INTO and streaming tables.
- Auto Loader to incrementally and efficiently processes new data files as they arrive in cloud storage. See What is Auto Loader?.
- Data ingestion partner solutions. See Connect to ingestion partners using Partner Connect.
- The add data UI to directly upload files. See Upload files to Azure Databricks.
Ingestion alternatives
When using serverless compute, you can also use the following features to query your data without moving it.
- If you want to limit data duplication or guarantee that you are querying the freshest possible data, Databricks recommends using Delta Sharing. See What is Delta Sharing?.
- For ad hoc reporting and proof-of-concept work, Lakehouse Federation enables you to query external databases directly from Azure Databricks without moving data, governed by Unity Catalog. See What is Lakehouse Federation?.
Try one or both of these features and see whether they satisfy your query performance requirements.
Supported Spark configurations
To automate the configuration of Spark on serverless compute, Azure Databricks has removed support for manually setting most Spark configurations. To view a list of supported Spark configuration parameters, see Configure Spark properties for serverless notebooks and jobs.
Job runs on serverless compute will fail if you set an unsupported Spark configuration.
Monitor the cost of serverless compute
There are multiple features you can use to help you monitor the cost of serverless compute:
- Use serverless budget policies to attribute your serverless compute usage.
- Use system tables to create dashboards, set up alerts, and perform ad hoc queries. See Monitor the cost of serverless compute.
- Set up budget alerts in your account. See Create and monitor budgets.
- Import a pre-configured usage dashboard. See Import a usage dashboard.