Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Azure Databricks offers two ways to build materialized views and streaming tables: standalone pipelines, or full pipelines created with Lakeflow Spark Declarative Pipelines. Both run on the same declarative engine and produce Unity Catalog managed tables. The difference is how much of the pipeline you author and operate.
- A standalone materialized view or streaming table is a single dataset defined with SQL syntax. Azure Databricks creates and manages a pipeline behind the scenes to refresh it. You create and refresh standalone datasets from a Databricks SQL warehouse, or from a notebook on serverless general compute using
spark.sql(). See Standalone pipelines. - A Lakeflow Spark Declarative Pipelines pipeline is a pipeline that you author and operate as a unit. It can contain many datasets, in SQL and Python, with dependency orchestration, lineage, and pipeline-wide operational features. See What are pipelines?.
When you create a standalone materialized view or streaming table, the managed pipeline appears on the Jobs & Pipelines page with a pipeline type of MV/ST. Datasets defined in a Lakeflow Spark Declarative Pipelines pipeline have a pipeline type of ETL.
When to use a standalone pipeline
Use standalone materialized views and streaming tables when:
- You accelerate queries or transform data with a single materialized view or streaming table.
- You work from a Databricks SQL warehouse, the SQL editor, or a notebook on serverless general compute, and schedule refreshes with
SCHEDULE,TRIGGER ON UPDATE, or a SQL task in a job. - You don't need sinks, multi-stage orchestration, or other pipeline-only features.
When to use a Lakeflow Spark Declarative Pipelines pipeline
Use a Lakeflow Spark Declarative Pipelines pipeline when:
- You build a multi-stage pipeline with intermediate datasets, where Azure Databricks manages dependencies and lineage across the datasets. Intermediate datasets can be published to the catalog or kept private to the pipeline.
- You author tables and flows in Python.
- You write to external Delta tables or event streaming destinations using sinks (
create_sink()orforeach_batch_sink()). - You apply change data capture from a database snapshot using
create_auto_cdc_from_snapshot_flow(). - You want triggered or continuous execution across the whole pipeline.
Comparison
| Property | Standalone streaming table or materialized view | Pipeline streaming table or materialized view |
|---|---|---|
| Authoring interface | SQL syntax, from a Databricks SQL warehouse or with spark.sql() in a notebook on serverless general compute |
SQL and Python |
| Scope | One dataset, in a pipeline that Azure Databricks manages for you | Many datasets in one pipeline, with dependency orchestration and lineage |
| Execution | Triggered, with SCHEDULE, TRIGGER ON UPDATE, or a SQL task |
Triggered or continuous |
| Pipeline-only features | Sinks, create_auto_cdc_from_snapshot_flow(), private datasets |
|
| Pipeline type label | MV/ST |
ETL |
| Move between pipelines | Not supported; recreate the table in the target pipeline | Supported |