Create, run, and manage Delta Live Tables pipelines

You can create, run, manage, and monitor a Delta Live Tables pipeline using the UI or the Delta Live Tables API. You can also run your pipeline with an orchestration tool such as Azure Databricks jobs. This article focuses on performing Delta Live Tables tasks using the UI. To use the API, see the API guide, or automate the API with the Databricks Terraform provider and databricks_pipeline.

To create and run your first pipeline, see the Delta Live Tables quickstart.

Create a pipeline

Create an example pipeline

To create an example pipeline using sample data included in your Azure Databricks workspace, perform the following steps:

  1. Click Jobs Icon Workflows in the sidebar and click the Delta Live Tables tab.
  2. Click Blue Down Caret next to Create Pipeline and select Create pipeline from sample data. The Create pipeline from sample data page appears.
  3. Enter a name for the pipeline in the Pipeline name field.
  4. (Optional) To select a cluster policy defining limits for the pipeline cluster configuration, select a policy from the Cluster policy dropdown menu. To learn more about using cluster policies with a Delta Live Tables pipeline, see Define limits on pipeline clusters.
  5. To select a preferred language for the pipeline, click the SQL or Python radio button.
  6. (Optional) To make your tables available for discovery and querying, enter a database name in the Target schema field. See Publish datasets.
  7. Click Create.

To run the example pipeline, see Start a pipeline update.

Create a new pipeline

To create a new pipeline, perform the following steps:

  1. Do one of the following:

    • Click Jobs Icon Workflows in the sidebar, click the Delta Live Tables tab, and click Delta Live Tables Create Icon. The Create Pipeline page appears.
    • In the sidebar, click New Icon New and select Pipeline.
  2. Enter a name for the pipeline in the Pipeline name field.

  3. Select the Delta Live Tables product edition for the pipeline from the Product edition dropdown menu.

    The product edition option allows you to choose the best product edition based on the requirements of your pipeline. See Product editions.

  4. Select Triggered or Continuous for Pipeline mode. See Continuous and triggered pipelines.

  5. (Optional) To select a cluster policy defining limits for the pipeline cluster configuration, select a policy from the Cluster policy dropdown menu. To learn more about using cluster policies with a Delta Live Tables pipeline, see Define limits on pipeline clusters.

  6. Enter a path to a notebook containing your pipeline queries in the Notebook libraries field, or click File Picker Icon to browse to your notebook.

  7. (Optional) To add additional notebooks to the pipeline, click the Add notebook library button.

    You can add notebooks in any order. Delta Live Tables automatically analyzes dataset dependencies to construct the processing graph for your pipeline.

  8. (Optional) To configure a storage location for output data from the pipeline, enter a DBFS or cloud storage path in the Storage location field. The system uses a default location if you leave Storage Location empty.

  9. (Optional) To make your tables available for discovery and querying, enter a database name in the Target schema field. See Publish datasets.

  10. Select the cluster mode in the Cluster mode dropdown menu.

  11. Set the cluster size in the Cluster text boxes.

    • Enter Min workers and Max workers for a Legacy autoscaling or Enhanced autoscaling cluster.
    • Enter the fixed number of workers for a Fixed size cluster.
  12. (Optional) To run this pipeline using Photon runtime, click the Use Photon Acceleration check box.

  13. (Optional) To add Spark configuration settings to the cluster that will run the pipeline, click Advanced and click the Add configuration button.

  14. (Optional) To change the Delta Live Tables runtime version for this pipeline, click Advanced and select a channel in the Channel dropdown menu. See the channel field in the Delta Live Tables settings.

  15. Click Create.

To optionally view and edit the JSON configuration for your pipeline, click the JSON button on the Create Pipeline page.

Start a pipeline update

To run the pipeline you created, start a pipeline update. You can start an update in the Delta Live Tables UI or directly from a pipeline notebook. To start an update in a notebook, click Delta Live Tables > Start in the notebook toolbar. To start an update in the Delta Live Tables UI:

  1. Click Jobs Icon Workflows in the sidebar and click the Delta Live Tables tab. The Pipelines list displays.
  2. Do one of the following:
    • To start a pipeline update immediately, click Right Arrow Icon in the Actions column. The system returns a message confirming that your pipeline is starting.
    • To view more options before starting the pipeline, click the pipeline name. The Pipeline details page displays.

The Pipeline details page provides the following options:

To start an update of your pipeline from the Pipeline details page, click the Delta Live Tables Start Icon button.

You might want to reprocess data that has already been ingested, for example, because you modified your queries based on new requirements or to fix a bug calculating a new column. You can reprocess data that’s already been ingested by instructing the Delta Live Tables system to perform a full refresh from the UI. To perform a full refresh, click Blue Down Caret next to the Start button and click Full refresh all.

After starting an update or a full refresh, the system returns a message confirming your pipeline is starting.

After successfully starting the update, the Delta Live Tables system:

  1. Starts a cluster using a cluster configuration created by the Delta Live Tables system. You can also specify a custom cluster configuration.
  2. Creates any tables that don’t exist and ensures that the schema is correct for any existing tables.
  3. Updates tables with the latest data available.
  4. Shuts down the cluster when the update is complete.

You can track the progress of the update by viewing the event log at the bottom of the Pipeline details page.

View pipeline event log

To view details for a log entry, click the entry. The Pipeline event log details pop-up appears. To view a JSON document containing the log details, click the JSON tab.

To learn how to query the event log, for example, to analyze performance or data quality metrics, see Monitor pipelines with the Delta Live Tables event log.

When the pipeline update completes, you can also start an update to refresh only selected tables.

Start a pipeline update for selected tables

You may want to reprocess data for only selected tables in your pipeline. For example, during development, you only change a single table and want to reduce testing time, or a pipeline update fails and you want to refresh only the failed tables.

Note

You can use selective refresh with only triggered pipelines.

To start an update that refreshes selected tables only, on the Pipeline details page:

  1. Click Select tables for refresh. The Select tables for refresh dialog appears.

    If you do not see the Select tables for refresh button, make sure the Pipeline details page displays the most recent update, and the update is complete. If a DAG is not displayed for the most recent update, for example, because the update failed, the Select tables for refresh button is not displayed.

  2. To select the tables to refresh, click on each table. The selected tables are highlighted and labeled. To remove a table from the update, click on the table again.

  3. Click Refresh selection.

    Note

    The Refresh selection button displays the number of selected tables in parentheses.

To reprocess data that has already been ingested for the selected tables, click Blue Down Caret next to the Refresh selection button and click Full Refresh selection.

Start a pipeline update for failed tables

If a pipeline update fails because of errors in one or more tables in the pipeline graph, you can start an update of only failed tables and any downstream dependencies.

Note

Excluded tables are not refreshed, even if they depend on a failed table.

To update failed tables, on the Pipeline details page, click Refresh failed tables.

To update only selected failed tables:

  1. Click Button Down next to the Refresh failed tables button and click Select tables for refresh. The Select tables for refresh dialog appears.

  2. To select the tables to refresh, click on each table. The selected tables are highlighted and labeled. To remove a table from the update, click on the table again.

  3. Click Refresh selection.

    Note

    The Refresh selection button displays the number of selected tables in parentheses.

To reprocess data that has already been ingested for the selected tables, click Blue Down Caret next to the Refresh selection button and click Full Refresh selection.

View pipeline details

Pipeline graph

After the pipeline starts successfully, the pipeline graph displays. You can use your mouse to adjust the view or the Delta Live Tables Graph Buttons Icon buttons in the corner of the graph panel.

View pipeline graph

To view tooltips for data quality metrics, hover over the data quality values for a dataset in the pipeline graph.

When running an update that refreshes only selected tables, any tables not part of the refresh are labeled Excluded in the pipeline graph.

Pipeline details

The Pipeline details panel displays information about the pipeline and the current or most recent update of the pipeline, including pipeline and update identifiers, update status, update type, and update runtime.

The Pipeline Details panel also displays information about the pipeline compute cluster, including the compute cost, product edition, Databricks Runtime version, and the channel configured for the pipeline. To open the Spark UI for the cluster in a new tab, click the Spark UI button. To open the cluster logs in a new tab, click the Logs button. To open the cluster metrics in a new tab, click the Metrics button.

The Run as value displays the user that pipeline updates run as. The Run as user is the pipeline owner, and pipeline updates run with this user’s permissions. To change the run as user, click Permissions and change the pipeline owner.

Dataset details

To view details for a dataset, including the dataset schema and data quality metrics, click the dataset in the Graph view. The dataset details displays.

View dataset details

To open the pipeline notebook in a new window, click the Path value.

To close the dataset details view and return to the Pipeline details, click Delta Live Tables Close Dialog Button.

Stop a pipeline update

To stop a pipeline update, click Delta Live Tables Stop Icon.

Schedule a pipeline

You can start a triggered pipeline manually or run the pipeline on a schedule with an Azure Databricks job. You can create and schedule a job with a single pipeline task directly in the Delta Live Tables UI or add a pipeline task to a multi-task workflow in the jobs UI.

To create a single-task job and a schedule for the job in the Delta Live Tables UI:

  1. Click Schedule > Add a schedule. The Schedule button is updated to show the number of existing schedules if the pipeline is included in one or more scheduled jobs, for example, Schedule (5).
  2. Enter a name for the job in the Job name field.
  3. Set the Schedule to Scheduled.
  4. Specify the period, starting time, and time zone.
  5. Configure one or more email addresses to receive alerts on pipeline start, success, or failure.
  6. Click Create.

To create a multi-task workflow with an Azure Databricks job and add a pipeline task:

  1. Create a job in the jobs UI and add your pipeline to the job workflow using a Pipeline task.
  2. Create a schedule for the job in the jobs UI.

After creating the pipeline schedule, you can:

  • View a summary of the schedule in the Delta Live Tables UI, including the schedule name, whether it is paused, the last run time, and the status of the last run. To view the schedule summary, click the Schedule button.
  • Edit the job or the pipeline task.
  • Edit the schedule or pause and resume the schedule. The schedule will also be paused if you selected Manual when creating the schedule.
  • Run the job manually and view details on job runs.

View pipelines

Click Jobs Icon Workflows in the sidebar and click the Delta Live Tables tab. The Pipelines page appears with a list of all defined pipelines, the status of the most recent pipeline updates, the pipeline identifier, and the pipeline creator.

You can filter pipelines in the list by:

  • Pipeline name.
  • A partial text match on one or more pipeline names.
  • Selecting only the pipelines you own.
  • Selecting all pipelines you have permissions to access.

Click the Name column header to sort pipelines by name in ascending order (A -> Z) or descending order (Z -> A).

Pipeline names render as a link when you view the pipelines list, allowing you to right-click on a pipeline name and access context menu options such as opening the pipeline details in a new tab or window.

Edit settings

On the Pipeline details page, click the Settings button to view and modify the pipeline settings. You can add, edit, or remove settings. For example, to make pipeline output available for querying after you’ve created a pipeline:

  1. Click the Settings button. The Pipeline settings page appears.
  2. Enter a database name in the Target field.
  3. Click Save.

To view and edit the JSON specification, click the JSON button.

Configure database name in JSON

See Delta Live Tables settings for more information on configuration settings.

View update history

To view the history and status of pipeline updates, click the Update history dropdown menu.

Update history drop-down

To view the graph, details, and events for an update, select the update in the dropdown menu. To return to the latest update, click Show the latest update.

Publish datasets

When creating or editing a pipeline, you can configure the target setting to publish your table definitions to the Azure Databricks metastore and persist the records to Delta tables.

After your update completes, you can view the database and tables, query the data, or use the data in downstream applications.

See Publish data from Delta Live Tables pipelines.

Delete a pipeline

You can delete a pipeline from the Pipelines list or the Pipeline details page:

  • In the Pipelines list, click Trash in the Actions column.
  • On the Pipeline details page for your pipeline, click the Delete button.

Deleting a pipeline removes the pipeline definition from the Delta Live Tables system and cannot be undone.