แชร์ผ่าน


Use materialized views in Databricks SQL

This article describes how to create and refresh materialized views in Databricks SQL to improve performance and reduce the cost of your data processing and analysis workloads.

What are materialized views?

In Databricks SQL, materialized views are Unity Catalog managed tables that physically store the results of a query. Unlike standard views, which compute results on demand, materialized views cache the results and update them as the underlying source tables change—either on a schedule or automatically.

Materialized views are well-suited for data processing workloads such as extract, transform, and load (ETL) processing. Materialized views provide a simple, declarative way to process data for compliance, corrections, aggregations, or general change data capture (CDC). Materialized views also enable easy-to-use transformations by cleaning, enriching, and denormalizing base tables. By pre-computing expensive or frequently used queries, Materialized views lower query latency and resource consumption. In many cases, they can incrementally compute changes from source tables, further improving efficiency and end-user experience.

The following are common use cases for materialized views:

  • Keeping a BI dashboard up to date with minimal end-user query latency.
  • Reducing complex ETL orchestration with simple SQL logic.
  • Building complex, layered transformations.
  • Any use cases that demand consistent performance with up-to-date insights.

When you create a materialized view in a Databricks SQL warehouse, a serverless pipeline is created to process the create and refreshes to the materialized view. You can monitor the status of refresh operations in Catalog Explorer. See View details with DESCRIBE EXTENDED.

Requirements

Materialized views created in Databricks SQL are backed by a serverless pipeline. Your workspace must support serverless pipelines to use this functionality.

Requirements to create or refresh materialized views:

  • You must use a Unity Catalog-enabled pro or serverless SQL warehouse.

  • To refresh a materialized view, you must be in the workspace that created it.

  • To incrementally refresh a materialized view from Delta tables, the source tables must have row tracking enabled.

  • The owner (the user who creates the materialized view) must have the following permissions:

    • SELECT privilege on the base tables referenced by the materialized view.
    • USE CATALOG and USE SCHEMA privileges on the catalog and schema containing the source tables for the materialized view.
    • USE CATALOG and USE SCHEMA privileges on the target catalog and schema for the materialized view.
    • CREATE TABLE and CREATE MATERIALIZED VIEW privileges on the schema containing the materialized view.
  • To refresh a materialized view, you must have the REFRESH privilege on the materialized view.

Requirements to query materialized views:

  • You must be the owner of the materialized view, or have SELECT on the materialized view, along with USE SCHEMA and USE CATALOG on its parents.
  • You must use one of the following compute resources:
    • SQL warehouse

    • Lakeflow Spark Declarative Pipelines interfaces

    • Standard access mode compute (formerly shared access mode)

    • Dedicated access mode (formerly single user access mode) on Databricks Runtime 15.4 and above, as long as the workspace is enabled for serverless compute. See Fine-grained access control on dedicated compute.

      If you are the materialized view owner, you can use a Dedicated access mode compute resource that is running Databricks Runtime between 14.3 and above.

To learn about other restrictions on using materialized views, see Limitations.

Create a materialized view

Databricks SQL materialized view CREATE operations use a Databricks SQL warehouse to create and load data in the materialized view. Creating a materialized view is a synchronous operation, which means that the CREATE MATERIALIZED VIEW command blocks until the materialized view is created and the initial data load finishes. A serverless pipeline is automatically created for every Databricks SQL materialized view. When the materialized view is refreshed, the pipeline processes the refresh.

To create a materialized view, use the CREATE MATERIALIZED VIEW statement. To submit a create statement, use the SQL editor in the Azure Databricks UI, the Databricks SQL CLI, or the Databricks SQL API.

The user who creates a materialized view is the materialized view owner.

The following example creates the materialized view mv1 from the base table base_table1:

-- This query defines the materialized view:
CREATE OR REPLACE MATERIALIZED VIEW mv1
AS SELECT
  date,
  sum(sales) AS sum_of_sales
FROM
  base_table1
GROUP BY
  date;

When you create a materialized view using the CREATE OR REPLACE MATERIALIZED VIEW statement, the initial data refresh and population begins immediately. This does not consume SQL warehouse compute. Instead, a serverless pipeline is used for creation and subsequent refreshes.

Column comments on a base table are automatically propagated to the new materialized view on create only. To add a schedule, table constraints, or other properties, modify the materialized view definition (the SQL query).

The same SQL statement will refresh a materialized view if called a subsequent time, or on a schedule. A refresh done in this way acts as any other refresh. For details, see Refresh a materialized view.

To learn more about configuring a materialized view, see Configure materialized views in Databricks SQL. To learn about the complete syntax for creating a materialized view, see CREATE MATERIALIZED VIEW. To learn about loading data in different formats and from different places, see Load data in pipelines.

Load data from external systems

Materialized views can be created on external data using Lakehouse Federation for supported data sources. For information on loading data from sources not supported by Lakehouse Federation, see Data format options. For general information about loading data, including examples, see Load data in pipelines.

Hide sensitive data

You can use materialized views to hide sensitive data from users accessing the table. One way to do this is to create the query so it doesn't include that data in the first place. But you can also mask columns or filter rows based on the permissions of the querying user. For example, you could hide the tax_id column for users that are not in the group HumanResourcesDept. To do this, use the ROW FILTER and MASK syntax during the creation of the materialized view. For more information, see Row filters and column masks.

Refresh a materialized view

Refreshing a materialized view updates the view to reflect the latest changes to the base table at the time of the refresh.

When you define a materialized view, the CREATE OR REPLACE MATERIALIZED VIEW statement is used both to create the view, and to refresh it for any scheduled refreshes. You can also use the REFRESH MATERIALIZED VIEW statement to refresh the materialized view without needing to supply the query again. See REFRESH (MATERIALIZED VIEW or STREAMING TABLE) for details on the SQL syntax and parameters for this command. To learn more about the types of materialized views that can be incrementally refreshed, see Incremental refresh for materialized views.

To submit a refresh statement, use the SQL editor in the Azure Databricks UI, a notebook attached to a SQL warehouse, the Databricks SQL CLI, or the Databricks SQL API.

The owner, and any user who has been granted the REFRESH privilege on the table, can refresh the materialized view.

The following example refreshes the mv1 materialized view:

REFRESH MATERIALIZED VIEW mv1;

The operation is synchronous by default, meaning that the command blocks until the refresh operation is complete. To refresh asynchronously, you can add the ASYNC keyword:

REFRESH MATERIALIZED VIEW mv1 ASYNC;

How are Databricks SQL materialized views refreshed?

Materialized views automatically create and use serverless pipelines to process refresh operations. The refresh is managed by the pipeline and the update is monitored by the Databricks SQL warehouse used to create the materialized view. Materialized views can be updated using a pipeline that runs on a schedule. Databricks SQL created materialized views always run in triggered mode. See Triggered vs. continuous pipeline mode.

Materialized views are refreshed using one of two methods.

  • Incremental refresh - The system evaluates the view's query to identify changes that happened after the last update and merges only the new or modified data.
  • Full refresh - If an incremental refresh can't be performed or is not cost-effective, the system runs the entire query and replaces the existing data in the materialized view with the new results.

The structure of the query and the type of source data determine whether incremental refresh is supported. To support incremental refresh, source data should be stored in Delta tables, with row tracking and change data feed enabled. To see if a query is incrementalizable, use the Databricks SQL EXPLAIN CREATE MATERIALIZED VIEW statement. After you create a materialized view, you can monitor its refresh behavior to verify whether it is updated incrementally or through a full refresh.

By default, Azure Databricks uses a cost model to choose the more cost-effective option between full and incremental refresh. You can override this behavior to prefer either incremental or full refreshes by setting a REFRESH POLICY in your SQL definition of the materialized view.

For details on refresh types, and how to optimize for incremental refreshes, see Incremental refresh for materialized views.

Asynchronous refreshes

By default, refresh operations are performed synchronously. You can also set a refresh operation to occur asynchronously. This can be set using the refresh command with the ASYNC keyword. See REFRESH (MATERIALIZED VIEW or STREAMING TABLE). The behavior associated with each approach is as follows:

  • Synchronous: A synchronous refresh prevents other operations from proceeding until the refresh is complete. If the result is needed for the next step, such as when sequencing refresh operations in orchestration tools like Lakeflow Jobs, use a synchronous refresh. To orchestrate materialized views with a job, use the SQL task type. See Lakeflow Jobs.
  • Asynchronous: An asynchronous refresh starts a background job on serverless compute when a materialized view refresh begins, allowing the command to return before the data load completes. This refresh type can save on cost because the operation doesn't necessarily hold compute capacity in the warehouse where the command is initiated. If the refresh becomes idle and no other tasks are running, the warehouse can shut down while the refresh uses other available compute. Additionally, asynchronous refreshes support starting multiple operations in parallel.

Schedule materialized view refreshes

You can configure a Databricks SQL materialized view to refresh automatically based on a defined schedule, or to trigger when upstream data is changed. The following table shows the different options for scheduling refreshes.

Method Description Example use case
Manual On-demand refresh using a SQL REFRESH statement, or through the workspace UI. Development, testing, ad-hoc updates.
TRIGGER ON UPDATE Define the materialized view to automatically refresh when the upstream data changes. Production workloads with data freshness SLAs or unpredictable refresh periods.
SCHEDULE Define the materialized view to refresh at defined time intervals. Predictable, time-based refresh requirements.
SQL task in a job Refresh is orchestrated through Lakeflow Jobs. Complex pipelines with cross-system dependencies.

Manual refresh

To manually refresh a materialized view, you can call a refresh from Databricks SQL, or use the workspace UI.

REFRESH statement

To refresh a pipeline using Databricks SQL:

  1. In the Query editor icon. SQL Editor, run the following statement:

    REFRESH MATERIALIZED VIEW <mv-name>;
    

For more information, see REFRESH (MATERIALIZED VIEW or STREAMING TABLE).

Workspace UI

To refresh a pipeline in the workspace UI:

  1. In the Azure Databricks workspace, click Workflows icon. Jobs & Pipelines.
  2. Select the pipeline you wish to refresh from the list.
  3. Click the Start button.

As the pipeline refreshes, you will see updates in the UI.

Trigger on update

The TRIGGER ON UPDATE clause automatically refreshes a materialized view when upstream source data changes. This eliminates the need to coordinate schedules across pipelines. The materialized view stays fresh without requiring the user to know when upstream jobs finish or maintain complex scheduling logic.

This is the recommended approach for production workloads, especially when upstream dependencies don't run on predictable schedules. Once configured, the materialized view monitors its source tables and refreshes automatically when changes in any of the upstream sources are detected.

Limitations

  • Upstream dependency limits: A materialized view can monitor a maximum of 10 upstream tables and 30 upstream views. For more dependencies, split the logic across multiple materialized views.
  • Workspace limits: A maximum of 1,000 materialized views with TRIGGER ON UPDATE can exist per workspace. Please contact Databricks support if more than 1,000 materialized views are needed.
  • Minimum interval: The minimum trigger interval is 1 minute.

The following examples show how to set a trigger on update when defining a materialized view.

Create a materialized view with trigger on update

To create a materialized view that refreshes automatically when source data changes, include the TRIGGER ON UPDATE clause in the CREATE MATERIALIZED VIEW statement.

The following example creates a materialized view that aggregates customer orders and refreshes whenever the source customers or orders tables are updated:

CREATE OR REPLACE MATERIALIZED VIEW catalog.schema.customer_orders
  TRIGGER ON UPDATE
AS SELECT
    c.customer_id,
    c.name,
    count(o.order_id) AS order_count
FROM catalog.schema.customers c
JOIN catalog.schema.orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

Throttle refresh frequency

If upstream data refreshes frequently, use AT MOST EVERY to cap how often the view refreshes and limit compute costs. This is useful when source tables update frequently but downstream consumers don't need real-time data. The INTERVAL keyword is required before the time value.

The following example limits the materialized view to refresh at most every 5 minutes, even if source data changes more frequently:

CREATE OR REPLACE MATERIALIZED VIEW catalog.schema.customer_orders
  TRIGGER ON UPDATE AT MOST EVERY INTERVAL 5 MINUTES
AS SELECT
    c.customer_id,
    c.name,
    count(o.order_id) AS order_count
FROM catalog.schema.customers c
JOIN catalog.schema.orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

Scheduled refresh

Refresh schedules can be defined directly in the materialized view definition to refresh the view at fixed time intervals. This approach is useful when the data update cadence is known and predictable refresh timing is desired.

When there is a refresh schedule, you can still run a manual refresh at any time if you need updated data.

Databricks supports two scheduling syntaxes: SCHEDULE EVERY for simple intervals and SCHEDULE CRON for precise scheduling. The SCHEDULE and SCHEDULE REFRESH keywords are semantically equivalent.

For details about the syntax and use of the SCHEDULE clause, see CREATE MATERIALIZED VIEW SCHEDULE clause.

When a schedule is created, a new Databricks job is automatically configured to process the update.

To view the schedule, do one of the following:

  • Run the DESCRIBE EXTENDED statement from the SQL editor in the Azure Databricks UI. See DESCRIBE TABLE.
  • Use Catalog Explorer to view the materialized view. The schedule is listed on the Overview tab, under Refresh status. See What is Catalog Explorer?.

The following examples show how to create a materialized view with a schedule:

Schedule every time interval

This example schedules a refresh once every hour:

CREATE OR REPLACE MATERIALIZED VIEW catalog.schema.hourly_metrics
  SCHEDULE EVERY 1 HOUR
AS SELECT
    date_trunc('hour', event_time) AS hour,
    count(*) AS events
FROM catalog.schema.raw_events
GROUP BY 1;

Schedule using cron

This example schedules a refresh every 15 minutes, at the quarter hour of the UTC time zone:

CREATE OR REPLACE MATERIALIZED VIEW catalog.schema.regular_metrics
  SCHEDULE CRON '0 */15 * * * ?' AT TIME ZONE 'UTC'
AS SELECT
    date_trunc('minute', event_time) AS minute,
    count(*) AS events
FROM catalog.schema.raw_events
WHERE event_time > current_timestamp() - INTERVAL 1 HOUR
GROUP BY 1;

SQL task in a job

Materialized view refreshes can be orchestrated through Databricks Jobs by creating SQL tasks that include REFRESH MATERIALIZED VIEW commands. This approach integrates materialized view refreshes into existing job orchestration workflows.

There are two ways to create a job for refreshing materialized views:

  • From the SQL Editor: Write the REFRESH MATERIALIZED VIEW command and click the Schedule button to create a job directly from the query.
  • From the Jobs UI: Create a new job, add a SQL task type, and attach a SQL Query or Notebook with the REFRESH MATERIALIZED VIEW command.

The following example shows the SQL statement within a SQL task that refreshes a materialized view:

REFRESH MATERIALIZED VIEW catalog.schema.daily_sales_summary;

This approach is appropriate when:

  • Complex multi-step pipelines have dependencies across systems.
  • Integration with existing job orchestration is required.
  • Job-level alerting and monitoring is needed.

SQL tasks use both the SQL warehouse attached to the job and the serverless compute that executes the refresh. If using materialized view definition-based scheduling meets the requirements, switching to TRIGGER ON UPDATE or SCHEDULE can simplify the workflow.

Add a schedule to an existing materialized view

To set the schedule after creation, use the ALTER MATERIALIZED VIEW statement:

-- Alters the schedule to refresh the materialized view when its upstream data
-- gets updated.
ALTER MATERIALIZED VIEW sales
  ADD TRIGGER ON UPDATE;

Modify an existing schedule or trigger

If a materialized view already has a schedule or trigger associated, use ALTER SCHEDULE or ALTER TRIGGER ON UPDATE to change the refresh configuration. This applies whether changing from one schedule to another, one trigger to another, or switching between a schedule and a trigger.

The following example changes an existing schedule to refresh every 4 hours:

ALTER MATERIALIZED VIEW catalog.schema.my_view
  ALTER SCHEDULE EVERY 4 HOURS;

Drop a schedule or trigger

To remove a schedule, use ALTER ... DROP:

ALTER MATERIALIZED VIEW catalog.schema.my_view
  DROP SCHEDULE;

Stop an active refresh

To stop an active refresh in the Azure Databricks UI, in the Pipeline details page click Stop to stop the pipeline update. You can also stop the refresh with the Databricks CLI or the POST /api/2.0/pipelines/{pipeline_id}/stop operation in the Pipelines API.

Timeouts for refreshes

Long-running refreshes can timeout. Materialized views created or refreshed after August 14, 2025 will use the timeout associated with the SQL warehouse used to run the refresh. If the warehouse does not have a timeout set, the default of 2 days will be used.

Note

The materialized view only synchronizes the timeout when you manually run a CREATE OR REFRESH statement. Scheduled updates retain the timeout from the most recent CREATE OR REFRESH.

You can explicitly set the timeout with a STATEMENT_TIMEOUT configuration in your SQL for the refresh. See STATEMENT_TIMEOUT.

Permanently delete records from a materialized view with deletion vectors enabled

Important

Support for the REORG statement with materialized views is in Public Preview.

Note

  • Using a REORG statement with a materialized view requires Databricks Runtime 15.4 and above.
  • Although you can use the REORG statement with any materialized view, it's only required when deleting records from a materialized view with deletion vectors enabled. The command has no effect when used with a materialized view without deletion vectors enabled.

To physically delete records from the underlying storage for a materialized view with deletion vectors enabled, such as for GDPR compliance, additional steps must be taken to ensure that a VACUUM operation runs on the materialized view's data.

To physically delete records:

  1. Run a REORG statement against the materialized view, specifying the APPLY (PURGE) parameter. For example REORG TABLE <materialized-view-name> APPLY (PURGE);. See REORG TABLE.
  2. Wait for the materialized view's data retention period to pass. The default data retention period is seven days, but it can be configured with the delta.deletedFileRetentionDuration table property. See Configure data retention for time travel queries.
  3. REFRESH the materialized view. See Refresh a materialized view. Within 24 hours of the REFRESH operation, pipeline maintenance tasks, including the VACUUM operation which is required to ensure records are permanently deleted, are run automatically.

Drop a materialized view

Note

To submit the command to drop a materialized view, you must be the owner of that materialized view or have the MANAGE privilege on the materialized view.

To drop a materialized view, use the DROP VIEW statement. To submit a DROP statement, you can use the SQL editor in the Azure Databricks UI, the Databricks SQL CLI, or the Databricks SQL API. The following example drops the mv1 materialized view:

DROP MATERIALIZED VIEW mv1;

You can also use the Catalog Explorer to drop a materialized view.

  1. Click Data icon. Catalog in the sidebar.
  2. In the Catalog Explorer tree at the left, open the catalog and select the schema where your materialized view is located.
  3. Open the Tables item under the schema you selected, and click the materialized view.
  4. On the kebab menu Kebab menu icon., select Delete.

Understand the costs of a materialized view

Because an materialized view runs in serverless compute, outside of the compute that you set up for a notebook or job, you may wonder how to understand the costs associated with it. Materialized view usage is tracked by DBU consumption. To learn more, see What is the DBU consumption of a materialized view or streaming table?

Enabling row tracking

In order to support incremental refreshes from Delta tables, row tracking must be enabled for those source tables. If you recreate a source table, you must re-enable row tracking.

The following example shows how to enable row tracking on a table:

ALTER TABLE source_table SET TBLPROPERTIES (delta.enableRowTracking = true);

For more details, see Row tracking in Databricks

Limitations

  • For compute and workspace requirements, see Requirements.
  • For incremental refresh requirements, see Incremental refresh for materialized views.
  • Materialized views do not support identity columns or surrogate keys.
  • If a materialized view uses a sum aggregate over a NULL-able column and only NULL values remain in that column, the materialized views resultant aggregate value is zero instead of NULL.
  • You cannot read a change data feed from a materialized view.
  • Time travel queries are not supported on materialized views.
  • The underlying files supporting materialized views might include data from upstream tables (including possible personally identifiable information) that do not appear in the materialized view definition. This data is automatically added to the underlying storage to support incremental refreshing of materialized views. Because the underlying files of a materialized view might risk exposing data from upstream tables not part of the materialized view schema, Databricks recommends not sharing the underlying storage with untrusted downstream consumers. For example, suppose the definition of a materialized view includes a COUNT(DISTINCT field_a) clause. Even though the materialized view definition only includes the aggregate COUNT DISTINCT clause, the underlying files will contain a list of the actual values of field_a.
  • You may incur some serverless compute charges, even when using these features on dedicated compute.
  • If you need to use an Azure Private Link connection with your materialized view, contact your Databricks representative.

Access materialized views from external clients

To access materialized views from external Delta Lake or Iceberg clients that don't support open APIs, you can use Compatibility Mode. Compatibility Mode creates a read-only version of your materialized view that can be accessed by any Delta Lake or Iceberg client.