Use streaming tables in Databricks SQL

Databricks recommends using streaming tables to ingest data using Databricks SQL. A streaming table is a table registered to Unity Catalog with extra support for streaming or incremental data processing. A pipeline is automatically created for each streaming table. You can use streaming tables for incremental data loading from Kafka and cloud object storage.

Note

To learn how to use Delta Lake tables as streaming sources and sinks, see Delta table streaming reads and writes.

Requirements

To use streaming tables, you must meet the following requirements.

Workspace requirements:

Streaming tables created in Databricks SQL are backed by serverless pipelines. Your workspace must support serverless pipelines to use this functionality.

An Azure Databricks account with serverless enabled. For more information, see Set up serverless SQL warehouses.
A workspace with Unity Catalog enabled. For more information, see Get started with Unity Catalog.

Compute requirements:

You must use one of the following:

A SQL warehouse that uses the Current channel.
Compute with standard access mode (formerly shared access mode) on Databricks Runtime 13.3 LTS or above.

Compute with dedicated access mode (formerly single user access mode) on Databricks Runtime 15.4 LTS or above.

On Databricks Runtime 15.3 and below, you cannot use dedicated compute to query streaming tables that are owned by other users. You can use dedicated compute on Databricks Runtime 15.3 and below only if you own the streaming table. The creator of the table is the owner.

Databricks Runtime 15.4 LTS and above support querying pipeline-generated tables on dedicated compute, even if you aren't the table owner. You might be charged for serverless compute resources when you use dedicated compute to run data filtering operations. See Fine-grained access control on dedicated compute.

Permissions requirements:

USE CATALOG and USE SCHEMA privileges on the catalog and schema in which you create the streaming table.
The CREATE TABLE privilege on the schema in which you create the streaming table.
Privileges for accessing the tables or locations providing the source data for your streaming table.

Create streaming tables

A streaming table is defined by a SQL query in Databricks SQL. When you create a streaming table, the data currently in the source tables is used to build the streaming table. After that, you refresh the table, usually on a schedule, to pull in any added data in the source tables to append to the streaming table.

When you create a streaming table, you are considered the owner of the table.

To create a streaming table from an existing table, use the CREATE STREAMING TABLE statement, as in the following example:

CREATE OR REFRESH STREAMING TABLE sales
  SCHEDULE EVERY 1 hour
  AS SELECT product, price FROM STREAM raw_data;

In this case, the streaming table sales is created from specific columns of the raw_data table, with a schedule to refresh every hour. The query used must be a streaming query. Use the STREAM keyword to use streaming semantics to read from the source.

Compute used for refresh

When you create a streaming table using the CREATE OR REFRESH STREAMING TABLE statement, the initial data refresh and population begin immediately. These operations do not consume Databricks SQL warehouse compute. Instead, streaming tables rely on serverless pipelines for both creation and refresh. A dedicated serverless pipeline is automatically created and managed by the system for each streaming table.

Load files with Auto Loader

To create a streaming table from files in a volume, you use Auto Loader. Use Auto Loader for most data ingestion tasks from cloud object storage. Auto Loader and pipelines are designed to incrementally and idempotently load ever-growing data as it arrives in cloud storage.

To use Auto Loader in Databricks SQL, use the read_files function. The following examples shows using Auto Loader to read a volume of JSON files into a streaming table:

CREATE OR REFRESH STREAMING TABLE sales
  SCHEDULE EVERY 1 hour
  AS SELECT * FROM STREAM read_files(
    "/Volumes/my_catalog/my_schema/my_volume/path/to/data",
    format => "json"
  );

To read data from cloud storage, you can also use Auto Loader:

CREATE OR REFRESH STREAMING TABLE sales
  SCHEDULE EVERY 1 hour
  AS SELECT *
  FROM STREAM read_files(
    'abfss://myContainer@myStorageAccount.dfs.core.windows.net/analysis/*/*/*.json',
    format => "json"
  );

To learn about Auto Loader, see What is Auto Loader?. To learn more about using Auto Loader in SQL, with examples, see Load data from object storage.

Streaming ingestion from other sources

For example of ingestion from other sources, including Kafka, see Load data in pipelines.

Apply change data capture (CDC) with Auto CDC flows

Important

This feature is in Beta. Available in Databricks SQL using the Current channel.

Use the FLOW AUTO CDC clause to process change data capture (CDC) records from a source into a streaming table. Previously, the MERGE INTO statement was commonly used for processing CDC records on Azure Databricks. However, MERGE INTO can produce incorrect results because of out-of-sequence records or requires complex logic to re-order records. See Change data capture and snapshots.

AUTO CDC simplifies CDC by automatically handling out-of-order records. You specify keys to identify records, a sequence column for ordering, and whether to store results as SCD type 1 (direct updates) or SCD type 2 (history tracking).

The following example creates a streaming table that applies CDC changes using SCD type 1:

CREATE OR REFRESH STREAMING TABLE target
  FLOW AUTO CDC
  FROM stream(cdc_data.users)
  KEYS (userId)
  SEQUENCE BY sequenceNum
  STORED AS SCD TYPE 1;

The following example uses SCD type 2 to retain a history of changes:

CREATE OR REFRESH STREAMING TABLE target
  FLOW AUTO CDC
  FROM stream(cdc_data.users)
  KEYS (userId)
  APPLY AS DELETE WHEN operation = "DELETE"
  SEQUENCE BY sequenceNum
  COLUMNS * EXCEPT (operation, sequenceNum)
  STORED AS SCD TYPE 2;

For full details on Auto CDC options and behavior, see The AUTO CDC APIs: Simplify change data capture with pipelines. For the complete syntax reference, see CREATE STREAMING TABLE.

Ingest new data only

By default, the read_files function reads all existing data in the source folder during table creation, and then processes newly arriving records with each refresh.

To avoid ingesting data that already exists in the source folder at the time of table creation, set the includeExistingFiles option to false. This means that only data that arrives in the folder after table creation is processed. For example:

CREATE OR REFRESH STREAMING TABLE sales
  SCHEDULE EVERY 1 hour
  AS SELECT *
  FROM STREAM read_files(
    '/path/to/files',
    includeExistingFiles => false
  );

Set the runtime channel

Streaming tables created using SQL warehouses are automatically refreshed using a pipeline. Pipelines use the runtime in the current channel by default. See Lakeflow Spark Declarative Pipelines release notes and the release upgrade process to learn about the release process.

Databricks recommends using the current channel for production workloads. New features are first released to the preview channel. You can set a pipeline to the preview channel to test new features by specifying preview as a table property using a CREATE OR REFRESH STREAMING TABLE statement. To update the channel for an existing streaming table, you must run CREATE OR REFRESH STREAMING TABLE with the updated TBLPROPERTIES.

The following code example shows how to set the channel to preview:

CREATE OR REFRESH STREAMING TABLE sales
  TBLPROPERTIES ('pipelines.channel' = 'preview')
  SCHEDULE EVERY 1 hour
  AS SELECT *
  FROM STREAM raw_data;

Hide sensitive data

You can use streaming tables to hide sensitive data from users accessing the table. One approach is to define the query so that it excludes sensitive columns or rows entirely. Alternatively, you can apply column masks or row filters based on the permissions of the querying user. For example, you might hide the tax_id column for users who are not in the group HumanResourcesDept. To do this, use the ROW FILTER and MASK syntax during the creation of the streaming table. For more information, see Row filters and column masks.

Refresh a streaming table

Streaming tables automatically create and use serverless pipelines to process refresh operations. The refresh is managed by the pipeline and the update is monitored by the Databricks SQL warehouse used to create the streaming table. Streaming tables can be updated using a pipeline that runs on a schedule.

Even if you have a scheduled refresh, you can call a manual refresh at any time. Refreshes are handled by the same pipeline that was automatically created along with the streaming table.

To refresh a streaming table:

REFRESH STREAMING TABLE sales;

You can check the status of the latest refresh with DESCRIBE TABLE EXTENDED.

Note

You might need to refresh your streaming table before using time travel queries.

To learn how to schedule a refresh, see Schedule refreshes in Databricks SQL. Scheduled refreshes can have update notifications, and you can set the performance mode for the refresh.

How refresh works

A streaming table refresh only evaluates new rows that have arrived after the last update, and appends only the new data.

Each refresh uses the current definition of the streaming table to process this new data. Modifying a streaming table definition does not automatically recalculate existing data. If a modification is incompatible with existing data (for example, changing a data type), the next refresh fails with an error.

The following examples explain how changes to a streaming table definition affect refresh behavior:

Removing a filter does not reprocess previously filtered rows.
Changing column projections won't affect how existing data was processed.
Joins with static snapshots use the snapshot state at the time of the initial processing. Late-arriving data that would have matched with the updated snapshot is ignored. This can lead to facts being dropped if dimensions are late.
Modifying the CAST of an existing column results in an error.

If your data changes in a way that cannot be supported in the existing streaming table, you can perform a full refresh.

Fully refresh a streaming table

Full refreshes re-process all data available in the source with the latest definition. It is not recommended to call full refreshes on sources that don't keep the entire history of the data or have short retention periods, such as Kafka, because the full refresh truncates the existing data. You might not be able to recover old data if the data is no longer available in the source.

For example:

REFRESH STREAMING TABLE sales FULL;

Change the schedule for a streaming table

You can configure a Databricks SQL streaming table to refresh automatically based on a defined schedule, or to trigger when upstream data is changed. The following table shows the different options for scheduling refreshes.

Method	Description	Example use case
Manual	On-demand refresh using a SQL `REFRESH` statement, or through the workspace UI.	Development, testing, ad-hoc updates.
`TRIGGER ON UPDATE`	Define the streaming table to automatically refresh when the upstream data changes.	Production workloads with data freshness SLAs or unpredictable refresh periods.
`SCHEDULE`	Define the streaming table to refresh at defined time intervals.	Predictable, time-based refresh requirements.
SQL task in a job	Refresh is orchestrated through Lakeflow Jobs.	Complex pipelines with cross-system dependencies.

Manual refresh

To manually refresh a streaming table, you can call a refresh from Databricks SQL, or use the workspace UI.

REFRESH statement

To refresh a pipeline using Databricks SQL:

In the SQL Editor, run the following statement:
```
REFRESH MATERIALIZED VIEW <table-name>;
```

For more information, see REFRESH (MATERIALIZED VIEW or STREAMING TABLE).

Workspace UI

To refresh a pipeline in the workspace UI:

In the Azure Databricks workspace, click Jobs & Pipelines.
Select the pipeline you wish to refresh from the list.
Click the Start button.

As the pipeline refreshes, you will see updates in the UI.

Trigger on update

The TRIGGER ON UPDATE clause automatically refreshes a streaming table when upstream source data changes. This eliminates the need to coordinate schedules across pipelines. The streaming table stays fresh without requiring the user to know when upstream jobs finish or maintain complex scheduling logic.

This is the recommended approach for production workloads, especially when upstream dependencies don't run on predictable schedules. Once configured, the streaming table monitors its source tables and refreshes automatically when changes in any of the upstream sources are detected.

Limitations

Upstream dependency limits: A streaming table can monitor a maximum of 10 upstream tables and 30 upstream views. For more dependencies, split the logic across multiple streaming tables.
Workspace limits: A maximum of 1,000 streaming tables with TRIGGER ON UPDATE can exist per workspace. Please contact Databricks support if more than 1,000 streaming tables are needed.
Minimum interval: The minimum trigger interval is 1 minute.

The following examples show how to set a trigger on update when defining a streaming table.

Create a streaming table with trigger on update

To create a streaming table that refreshes automatically when source data changes, include the TRIGGER ON UPDATE clause in the CREATE STREAMING TABLE statement.

The following example creates a streaming table that reads customer orders and refreshes whenever the source orders table is updated:

CREATE OR REFRESH STREAMING TABLE catalog.schema.customer_orders
  TRIGGER ON UPDATE
AS SELECT
    o.customer_id,
    o.name,
    o.order_id
FROM STREAM catalog.schema.orders o;

Throttle refresh frequency

If upstream data refreshes frequently, use AT MOST EVERY to cap how often the view refreshes and limit compute costs. This is useful when source tables update frequently but downstream consumers don't need real-time data. The INTERVAL keyword is required before the time value.

The following example limits the streaming table to refresh at most every 5 minutes, even if source data changes more frequently:

CREATE OR REFRESH STREAMING TABLE catalog.schema.customer_orders
  TRIGGER ON UPDATE AT MOST EVERY INTERVAL 5 MINUTES
AS SELECT
    o.customer_id,
    o.name,
    o.order_id
FROM STREAM catalog.schema.orders o;

Scheduled refresh

Refresh schedules can be defined directly in the streaming table definition to refresh the view at fixed time intervals. This approach is useful when the data update cadence is known and predictable refresh timing is desired.

When there is a refresh schedule, you can still run a manual refresh at any time if you need updated data.

Databricks supports two scheduling syntaxes: SCHEDULE EVERY for simple intervals and SCHEDULE CRON for precise scheduling. The SCHEDULE and SCHEDULE REFRESH keywords are semantically equivalent.

For details about the syntax and use of the SCHEDULE clause, see CREATE STREAMING TABLE SCHEDULE clause.

When a schedule is created, a new Databricks job is automatically configured to process the update.

To view the schedule, do one of the following:

Run the DESCRIBE EXTENDED statement from the SQL editor in the Azure Databricks UI. See DESCRIBE TABLE.
Use Catalog Explorer to view the streaming table. The schedule is listed on the Overview tab, under Refresh status. See What is Catalog Explorer?.

The following examples show how to create a streaming table with a schedule:

Schedule every time interval

This example schedules a refresh once every 5 minutes:

CREATE OR REFRESH STREAMING TABLE catalog.schema.hourly_metrics
  SCHEDULE EVERY 5 MINUTES
AS SELECT
    event_id,
    event_time,
    event_type
FROM catalog.schema.raw_events;

Schedule using cron

This example schedules a refresh every 15 minutes, at the quarter hour of the UTC time zone:

CREATE OR REFRESH STREAMING TABLE catalog.schema.hourly_metrics
  SCHEDULE CRON '0 */15 * * * ?' AT TIME ZONE 'UTC'
AS SELECT
    event_id,
    event_time,
    event_type
FROM catalog.schema.raw_events;

SQL task in a job

Streaming table refreshes can be orchestrated through Databricks Jobs by creating SQL tasks that include REFRESH STREAMING TABLE commands. This approach integrates streaming table refreshes into existing job orchestration workflows.

There are two ways to create a job for refreshing streaming tables:

From the SQL Editor: Write the REFRESH STREAMING TABLE command and click the Schedule button to create a job directly from the query.
From the Jobs UI: Create a new job, add a SQL task type, and attach a SQL Query or Notebook with the REFRESH STREAMING TABLE command.

The following example shows the SQL statement within a SQL task that refreshes a streaming table:

REFRESH STREAMING TABLE catalog.schema.sales;

This approach is appropriate when:

Complex multi-step pipelines have dependencies across systems.
Integration with existing job orchestration is required.
Job-level alerting and monitoring is needed.

SQL tasks use both the SQL warehouse attached to the job and the serverless compute that executes the refresh. If using streaming table definition-based scheduling meets the requirements, switching to TRIGGER ON UPDATE or SCHEDULE can simplify the workflow.

Add a schedule to an existing streaming table

To set the schedule after creation, use the ALTER STREAMING TABLE statement:

-- Alters the schedule to refresh the streaming table when its upstream
-- data gets updated.
ALTER STREAMING TABLE sales
  ADD TRIGGER ON UPDATE;

Modify an existing schedule or trigger

If a streaming table already has a schedule or trigger associated, use ALTER SCHEDULE or ALTER TRIGGER ON UPDATE to change the refresh configuration. This applies whether changing from one schedule to another, one trigger to another, or switching between a schedule and a trigger.

The following example changes an existing schedule to refresh every 5 minutes:

ALTER STREAMING TABLE catalog.schema.my_table
  ALTER SCHEDULE EVERY 5 MINUTES;

Drop a schedule or trigger

To remove a schedule, use ALTER ... DROP:

ALTER STREAMING TABLE catalog.schema.my_table
  DROP SCHEDULE;

Track the status of a refresh

You can view the status of a streaming table refresh by viewing the pipeline that manages the streaming table in the Pipelines UI or by viewing the Refresh Information returned by the DESCRIBE EXTENDED command for the streaming table.

DESCRIBE TABLE EXTENDED <table-name>;

Alternately, you can view the streaming table in Catalog Explorer and see the refresh status there:

Click Catalog in the sidebar.
In the Catalog Explorer tree at the left, open the catalog and select the schema where your streaming table is located.
Open the Tables item under the schema you selected, and click the streaming table.

From here, you can use the tabs under the streaming table name to view and edit information about the streaming table, including:

Refresh status and history
The table schema
Sample data (requires an active compute)
Permissions
Lineage, including tables and pipelines that this streaming table depends on
Insights into usage
Monitors that you have created for this streaming table

Timeouts for refreshes

Streaming table refreshes are run with a timeout that limits how long they can run. For streaming tables created or updated on or after August 14, 2025, the timeout is captured when you update by running CREATE OR REFRESH:

If a STATEMENT_TIMEOUT is set, that value is used. See STATEMENT_TIMEOUT.
Otherwise, the timeout from the SQL warehouse used to run the command is used.
If the warehouse does not have a timeout configured, a default of 2 days applies.

The timeout is used on the initial create, but also on scheduled refreshes that follow.

For streaming tables that were last updated prior to August 14, 2025, the timeout is set to 2 days.

Example: Set a timeout for a streaming table refresh You can explicitly control how long a streaming table refresh is allowed to run by setting a statement-level timeout when creating or updating the table:

SET STATEMENT_TIMEOUT = '6h';

CREATE OR REFRESH STREAMING TABLE my_catalog.my_schema.my_st
  SCHEDULE EVERY 12 HOURS
AS SELECT * FROM large_source_table;

This sets up the streaming table to be refreshed every 12 hours, and if a refresh takes more than 6 hours, it times out and waits for the next scheduled refresh.

How scheduled refreshes handle timeouts

Timeouts are synchronized only when you explicitly run CREATE OR REFRESH.

Scheduled refreshes continue using the timeout captured during the most recent CREATE OR REFRESH.
Changing the warehouse timeout alone does not affect existing scheduled refreshes.

Important

After changing a warehouse timeout, run CREATE OR REFRESH again to apply the new timeout to future scheduled refreshes.

Control access to streaming tables

Streaming tables support rich access controls to support data-sharing while avoiding exposing potentially private data. A streaming table owner or a user with the MANAGE privilege can grant SELECT privileges to other users. Users with SELECT access to the streaming table do not require SELECT access to the tables referenced by the streaming table. This access control enables data sharing while controlling access to the underlying data.

You can also modify the owner of a streaming table.

Grant privileges to a streaming table

To grant access to a streaming table, use the GRANT statement:

GRANT <privilege_type> ON <st_name> TO <principal>;

The privilege_type can be:

SELECT - the user can SELECT the streaming table.
REFRESH - the user can REFRESH the streaming table. Refreshes are run using the owner's permissions.

The following example creates a streaming table and grants select and refresh privileges to users:

CREATE OR REFRESH STREAMING TABLE st_name AS SELECT * FROM source_table;

-- Grant read-only access:
GRANT SELECT ON st_name TO read_only_user;

-- Grant read and refresh access:
GRANT SELECT ON st_name TO refresh_user;
GRANT REFRESH ON st_name TO refresh_user;

For more information about granting privileges on Unity Catalog securable objects, see Unity Catalog privileges reference.

Revoke privileges from a streaming table

To revoke access from a streaming table, use the REVOKE statement:

REVOKE privilege_type ON <st_name> FROM principal;

When SELECT privileges on a source table are revoked from the streaming table owner or any other user who has been granted MANAGE or SELECT privileges on the streaming table, or the source table is dropped, the streaming table owner or user granted access is still able to query the streaming table. However, the following behavior occurs:

The streaming table owner or others who have lost access to a streaming table can no longer REFRESH that streaming table, and the streaming table becomes stale over time.
If automated with a schedule, the next scheduled REFRESH fails or is not run.

The following example revokes the SELECT privilege from read_only_user:

REVOKE SELECT ON st_name FROM read_only_user;

Change the owner of a streaming table

A user with MANAGE permissions on a streaming table defined in Databricks SQL can set a new owner through the Catalog Explorer. The new owner can be themselves or a service principal on which they have the Service Principal User role.

From your Azure Databricks workspace, click Catalog to open the Catalog Explorer.
Select the streaming table that you want to update.
In the right sidebar, under About this streaming table, find the Owner, and click edit.

Note

If you get a message that tells you to update the owner by changing the Run as user in pipeline settings, then the streaming table is defined in Lakeflow Spark Declarative Pipelines, not Databricks SQL. The message includes a link to the pipeline settings, where you can change the Run as user.
Select a new owner for the streaming table.

Owners automatically have MANAGE and SELECT privileges on streaming tables that they own. If you are setting a service principal as the owner for a streaming table that you own, and you do not explicitly have SELECT or MANAGE privileges on the streaming table, then this change would cause you to lose all access to the streaming table. In this case, you are prompted to explicitly provide those privileges.

Select both Grant MANAGE and Grant SELECT privileges to provide those on Save.
Click Save to change the owner.

The owner of the streaming table is updated. All future updates are run using the new owner's identity.

When the owner loses privileges to source tables

If you change the owner, and the new owner does not have access to the source tables (or SELECT privileges are revoked on the underlying source tables), users can still query the streaming table. However:

They cannot REFRESH the streaming table.
The next scheduled refresh of the streaming table fails.

Losing access to the source data prevents updates, but doesn't immediately invalidate the existing streaming table from being read.

Permanently delete records from a streaming table

Important

Support for the REORG statement with streaming tables is in Public Preview.

Note

Using a REORG statement with a streaming table requires Databricks Runtime 15.4 and above.
Although you can use the REORG statement with any streaming table, it's only required when deleting records from a streaming table with deletion vectors enabled. The command has no effect when used with a streaming table without deletion vectors enabled.

To physically delete records from the underlying storage for a streaming table with deletion vectors enabled, such as for GDPR compliance, additional steps must be taken to ensure that a VACUUM operation runs on the streaming table's data.

To physically delete records from underlying storage:

Update records or delete records from the streaming table.
Run a REORG statement against the streaming table, specifying the APPLY (PURGE) parameter. For example REORG TABLE <streaming-table-name> APPLY (PURGE);.
Wait for the streaming table's data retention period to pass. The default data retention period is seven days, but it can be configured with the delta.deletedFileRetentionDuration table property. See Configure data retention for time travel queries.
REFRESH the streaming table. See Refresh a streaming table. Within 24 hours of the REFRESH operation, pipeline maintenance tasks, including the VACUUM operation which is required to ensure records are permanently deleted, are run automatically.

Monitor runs using query history

You can use the query history page to access query details and query profiles that can help you identify poorly performing queries and bottlenecks in the pipeline used to run your streaming table updates. For an overview of the kind of information available in query histories and query profiles, see Query history and Query profile.

Important

This feature is in Public Preview. Workspace admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.

All statements related to streaming tables appear in the query history. You can use the Statement drop-down filter to select any command and inspect the related queries. All CREATE statements are followed by a REFRESH statement that executes asynchronously on a pipeline. The REFRESH statements typically include detailed query plans that provide insights into optimizing performance.

To access REFRESH statements in the query history UI, use the following steps:

Click in the left sidebar to open the Query History UI.
Select the REFRESH checkbox from the Statement drop-down filter.
Click the name of the query statement to view summary details like the duration of the query and aggregated metrics.
Click See query profile to open the query profile. See Query profile for details about navigating the query profile.
Optionally, you can use the links in the Query Source section to open the related query or pipeline.

You can also access query details using links in the SQL editor or from a notebook attached to a SQL warehouse.

Access streaming tables from external clients

To access streaming tables from external Delta Lake or Iceberg clients that don't support open APIs, you can use Compatibility Mode. Compatibility Mode creates a read-only version of your streaming table that can be accessed by any Delta Lake or Iceberg client.

Additional resources

Feedback

Was this page helpful?

Last updated on 2026-04-22