Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Migrate your workloads from classic compute to serverless compute. Serverless compute handles provisioning, scaling, runtime upgrades, and optimization automatically.
Most classic workloads can migrate with minimal or no code changes. This page focuses on those workloads. Some features, such as df.cache, are not yet supported on serverless, but will not require code changes once available. Certain workloads that depend on R or Scala notebooks require classic compute and will not be able to migrate to serverless. For a full list of current limitations, see Serverless compute limitations.
Migration steps
To migrate your workloads from classic compute to serverless compute, follow these steps:
- Check prerequisites: Verify that your workspace, networking, and cloud storage access meet the requirements. See Before you begin.
- Update code: Make any necessary code and configuration changes. See Update your code.
- Test your workloads: Validate compatibility and correctness before cutting over. See Test your workloads.
- Choose a performance mode: Select the performance mode that best matches your workload requirements. See Choose a performance mode.
- Migrate in phases: Roll out serverless incrementally, starting with new and low-risk workloads. See Migrate in phases.
- Monitor costs: Track serverless DBU consumption and set up alerts. See Monitor costs.
Before you begin
Before you begin migrating, you might need to update some legacy configurations in your workspace.
| Prerequisite | Action | Details |
|---|---|---|
| Workspace is enabled for Unity Catalog | Migrate from Hive Metastore if needed | Upgrade an Azure Databricks workspace to Unity Catalog |
| Networking configured | Replace VPC peering with NCCs, Private Link, or firewall rules | Serverless compute plane networking |
| Cloud storage access | Replace legacy data access patterns with Unity Catalog external locations | Connect to cloud object storage using Unity Catalog |
Confirm your workspace is in a supported region.
Update your code
The following sections list the code and configuration changes required to make your workloads compatible with serverless.
Data access
Legacy data access patterns are not supported on serverless. Update your code to use Unity Catalog instead.
| Classic pattern | Serverless replacement | Details |
|---|---|---|
DBFS paths (dbfs:/...) |
Unity Catalog volumes | What are Unity Catalog volumes? |
| Hive Metastore tables | Unity Catalog tables (or HMS Federation) | Upgrade an Azure Databricks workspace to Unity Catalog |
| Storage account credentials | Unity Catalog external locations | Connect to cloud object storage using Unity Catalog |
| Custom JDBC JARs | Lakehouse Federation | What is query federation? |
Warning
DBFS access is limited on serverless. Update all dbfs:/ paths to Unity Catalog volumes before migrating. For more information, see Migrate files stored in DBFS.
Example: Replace DBFS paths and Hive Metastore references
# Classic
df = spark.read.csv("dbfs:/mnt/datalake/data.csv", header=True)
df.write.parquet("dbfs:/mnt/output/results")
df = spark.table("my_database.my_table")
# Serverless
df = spark.read.csv("/Volumes/main/sales/raw_data/data.csv", header=True)
df.write.parquet("/Volumes/main/analytics/output/results")
df = spark.table("main.my_database.my_table") # three-level namespace
APIs and code
Certain APIs and code patterns are not supported on serverless. Reference this table to see if your code needs to be updated.
| Classic pattern | Serverless replacement | Details |
|---|---|---|
RDD APIs (sc.parallelize, rdd.map) |
DataFrame APIs | Compare Spark Connect to Spark Classic |
df.cache(), df.persist() |
Remove caching calls | Serverless compute limitations |
spark.sparkContext, sqlContext |
Use spark (SparkSession) directly |
Compare Spark Connect to Spark Classic |
Hive variables (${var}) |
SQL DECLARE VARIABLE or Python f-strings |
DECLARE VARIABLE |
| Unsupported Spark configs | Remove unsupported configs. Serverless auto-tunes most settings. | Configure Spark properties for serverless notebooks and jobs |
Example: Replace RDD operations with DataFrames
from pyspark.sql import functions as F
# sc.parallelize + rdd.map
# Classic: rdd = sc.parallelize([1, 2, 3]); rdd.map(lambda x: x * 2).collect()
df = spark.createDataFrame([(1,), (2,), (3,)], ["value"])
result = df.select((F.col("value") * 2).alias("value")).collect()
# rdd.flatMap
# Classic: sc.parallelize(["hello world"]).flatMap(lambda l: l.split(" ")).collect()
df = spark.createDataFrame([("hello world",)], ["line"])
words = df.select(F.explode(F.split("line", " ")).alias("word")).collect()
# rdd.groupByKey
# Classic: rdd.groupByKey().mapValues(list).collect()
df = spark.createDataFrame([("a", 1), ("b", 2), ("a", 3)], ["key", "value"])
grouped = df.groupBy("key").agg(F.collect_list("value").alias("values")).collect()
# rdd.mapPartitions → applyInPandas
import pandas as pd
def process_group(pdf: pd.DataFrame) -> pd.DataFrame:
return pd.DataFrame({"total": [pdf["id"].sum()]})
result = (spark.range(100).repartition(4)
.groupBy(F.spark_partition_id())
.applyInPandas(process_group, schema="total long").collect())
# sc.textFile → spark.read.text
df = spark.read.text("/Volumes/catalog/schema/volume/file.txt")
Example: Replace SparkContext and caching
from pyspark.sql.functions import broadcast
# sc.broadcast → broadcast join
result = main_df.join(broadcast(lookup_df), "key")
# sc.accumulator → DataFrame aggregation
total = df.agg(F.sum("amount")).collect()[0][0]
# sqlContext.sql → spark.sql
result = spark.sql("SELECT * FROM main.db.table")
# df.cache() → remove caching calls
# Materialize expensive intermediate results to Delta as a workaround:
df = spark.read.parquet(path)
result = df.filter("status = 'active'")
expensive_df.write.format("delta").mode("overwrite").saveAsTable("main.scratch.temp")
result = spark.table("main.scratch.temp")
Libraries and environments
You can manage libraries and environments at the workspace level using base environments and at the notebook level using the notebook's serverless environment.
| Classic pattern | Serverless replacement | Details |
|---|---|---|
| Init scripts | Serverless environments | Configure the serverless environment |
| Cluster-scoped libraries | Notebook-scoped or environment libraries | Configure the serverless environment |
| Maven/JAR libraries | JAR task support for jobs; PyPI for notebooks | JAR task for jobs |
| Docker containers | Serverless environments for library needs | Configure the serverless environment |
Pin Python packages in requirements.txt for reproducible environments. See Best practices for serverless compute.
Streaming
Streaming workloads are supported on serverless, but certain triggers are not supported. Update your code to use the supported triggers.
| Spark trigger | Supported | Notes |
|---|---|---|
Trigger.AvailableNow() |
Yes | Recommended |
Trigger.Once() |
Yes | This is deprecated. Use Trigger.AvailableNow() instead. |
Trigger.ProcessingTime(interval) |
No | Returns INFINITE_STREAMING_TRIGGER_NOT_SUPPORTED |
Trigger.Continuous(interval) |
No | Use Lakeflow Spark Declarative Pipelines continuous mode instead |
Default (not setting .trigger()) |
No | Omitting .trigger() defaults to ProcessingTime("0 seconds"), which is not supported on serverless. Always set .trigger(availableNow=True) explicitly. |
For continuous streaming, migrate to Spark Declarative Pipelines in continuous mode or use continuous-schedule jobs with AvailableNow. For large sources, set maxFilesPerTrigger or maxBytesPerTrigger to prevent out-of-memory errors.
Example: Fix streaming triggers
# Classic (not supported on serverless — default trigger is ProcessingTime)
query = df.writeStream.format("delta").outputMode("append").start()
# Serverless (explicit AvailableNow trigger)
query = (df.writeStream.format("delta").outputMode("append")
.trigger(availableNow=True)
.option("checkpointLocation", checkpoint_path)
.start(output_path))
query.awaitTermination()
# With OOM prevention for large sources
query = (spark.readStream.format("delta")
.option("maxFilesPerTrigger", 100)
.option("maxBytesPerTrigger", "10g")
.load(input_path)
.writeStream.format("delta")
.trigger(availableNow=True)
.option("checkpointLocation", checkpoint_path)
.start(output_path))
Test your workloads
- Quick compatibility test: Run the workload on classic compute with Standard access mode and Databricks Runtime 14.3 or above. If the run succeeds, the workload can migrate to serverless without any code changes.
- A/B comparison (recommended for production): Run the same workload on classic (control) and serverless (experiment). Diff output tables and verify correctness. Iterate until outputs match.
- Temporary configs: You can temporarily set supported Spark configs during testing. Remove them once stable.
Choose a performance mode
Serverless jobs and pipelines support two performance modes: standard and performance-optimized. The performance mode you choose depends on your workload requirements.
| Mode | Availability | Startup | Best for |
|---|---|---|---|
| Standard | Jobs, Lakeflow Spark Declarative Pipelines | 4-6 minutes | Cost-sensitive batch |
| Performance-optimized | Notebooks, Jobs, Lakeflow Spark Declarative Pipelines | Seconds | Interactive, latency-sensitive |
Migrate in phases
- New workloads: Start all new notebooks and jobs on serverless.
- Low-risk workloads: Migrate PySpark/SQL workloads already on standard access mode and Databricks Runtime 14.3 or above.
- Complex workloads: Migrate workloads needing code changes (RDD rewrites, DBFS updates, trigger fixes).
- Remaining workloads: Review periodically as capabilities expand.
Monitor costs
Serverless billing is based on DBU consumption, not cluster uptime. Validate cost expectations with representative workloads before migrating at scale.
- Serverless usage policies for cost attribution
- System tables for dashboards and alerts
- Account budget alerts
- Use the pre-configured usage dashboard for an overview of serverless spending.
Additional resources
- Best practices for serverless compute: Optimization tips for serverless workloads
- Serverless compute limitations: Full list of current limitations and unsupported features
- Configure the serverless environment: Manage libraries and dependencies
- Supported Spark configurations: Spark configs available on serverless
- Spark Connect vs. classic Spark: Behavioral differences in serverless architecture
- Serverless network security: NCCs, Private Link, and firewall configuration
- Serverless compute release notes: Track new capabilities as they ship
- Unity Catalog upgrade guide: Migrate from Hive Metastore to Unity Catalog
You can also refer to the following blog posts for more information:
- What is serverless computing?: Overview of serverless capabilities and customer results
- Evolution of data engineering: How serverless compute is transforming notebooks and Lakeflow jobs: How serverless powers Lakeflow Jobs and Pipelines