Delta Live Tables properties reference

Artikkeli
10/14/2024

This article provides a reference for Delta Live Tables JSON setting specification and table properties in Azure Databricks. For more details on using these various properties and configurations, see the following articles:

Delta Live Tables pipeline configurations

Fields
`id` Type: `string` A globally unique identifier for this pipeline. The identifier is assigned by the system and cannot be changed.
`name` Type: `string` A user-friendly name for this pipeline. The name can be used to identify pipeline jobs in the UI.
`storage` Type: `string` A location on DBFS or cloud storage where output data and metadata required for pipeline execution are stored. Tables and metadata are stored in subdirectories of this location. When the `storage` setting is not specified, the system will default to a location in `dbfs:/pipelines/`. The `storage` setting cannot be changed after a pipeline is created.
`configuration` Type: `object` An optional list of settings to add to the Spark configuration of the cluster that will run the pipeline. These settings are read by the Delta Live Tables runtime and available to pipeline queries through the Spark configuration. Elements must be formatted as `key:value` pairs.
`libraries` Type: `array of objects` An array of notebooks containing the pipeline code and required artifacts.
`clusters` Type: `array of objects` An array of specifications for the clusters to run the pipeline. If this is not specified, pipelines will automatically select a default cluster configuration for the pipeline.
`development` Type: `boolean` A flag indicating whether to run the pipeline in `development` or `production` mode. The default value is `true`
`notifications` Type: `array of objects` An optional array of specifications for email notifications when a pipeline update completes, fails with a retryable error, fails with a non-retryable error, or a flow fails.
`continuous` Type: `boolean` A flag indicating whether to run the pipeline continuously. The default value is `false`.
`target` Type: `string` The name of a database for persisting pipeline output data. Configuring the `target` setting allows you to view and query the pipeline output data from the Azure Databricks UI.
`channel` Type: `string` The version of the Delta Live Tables runtime to use. The supported values are: - `preview` to test your pipeline with upcoming changes to the runtime version. - `current` to use the current runtime version. The `channel` field is optional. The default value is `current`. Databricks recommends using the current runtime version for production workloads.
`edition` Type `string` The Delta Live Tables product edition to run the pipeline. This setting allows you to choose the best product edition based on the requirements of your pipeline: - `CORE` to run streaming ingest workloads. - `PRO` to run streaming ingest and change data capture (CDC) workloads. - `ADVANCED` to run streaming ingest workloads, CDC workloads, and workloads that require Delta Live Tables expectations to enforce data quality constraints. The `edition` field is optional. The default value is `ADVANCED`.
`photon` Type: `boolean` A flag indicating whether to use What is Photon? to run the pipeline. Photon is the Azure Databricks high performance Spark engine. Photon-enabled pipelines are billed at a different rate than non-Photon pipelines. The `photon` field is optional. The default value is `false`.
`pipelines.maxFlowRetryAttempts` Type: `int` The maximum number of attempts to retry a flow before failing a pipeline update when a retryable failure occurs. The default value is two. By default, when a retryable failure occurs, the Delta Live Tables runtime attempts to run the flow three times including the original attempt.
`pipelines.numUpdateRetryAttempts` Type: `int` The maximum number of attempts to retry an update before failing the update when a retryable failure occurs. The retry is run as a full update. The default is five. This parameter applies only to triggered updates run in production mode. There is no retry when your pipeline runs in development mode.

Delta Live Tables table properties

In addition to the table properties supported by Delta Lake, you can set the following table properties.

Table properties
`pipelines.autoOptimize.managed` Default: `true` Enables or disables automatically scheduled optimization of this table.
`pipelines.autoOptimize.zOrderCols` Default: None An optional string containing a comma-separated list of column names to z-order this table by. For example, `pipelines.autoOptimize.zOrderCols = "year,month"`
`pipelines.reset.allowed` Default: `true` Controls whether a full refresh is allowed for this table.

Pipelines trigger interval

You can specify a pipeline trigger interval for the entire Delta Live Tables pipeline or as part of a dataset declaration. See Set trigger interval for continuous pipelines.

`pipelines.trigger.interval`
The default is based on flow type: - Five seconds for streaming queries. - One minute for complete queries when all input data is from Delta sources. - Ten minutes for complete queries when some data sources may be non-Delta. The value is a number plus the time unit. The following are the valid time units: - `second`, `seconds` - `minute`, `minutes` - `hour`, `hours` - `day`, `days` You can use the singular or plural unit when defining the value, for example: - `{"pipelines.trigger.interval" : "1 hour"}` - `{"pipelines.trigger.interval" : "10 seconds"}` - `{"pipelines.trigger.interval" : "30 second"}` - `{"pipelines.trigger.interval" : "1 minute"}` - `{"pipelines.trigger.interval" : "10 minutes"}` - `{"pipelines.trigger.interval" : "10 minute"}`

pipelines.trigger.interval

The default is based on flow type:

- Five seconds for streaming queries.
- One minute for complete queries when all input data is from Delta sources.
- Ten minutes for complete queries when some data sources may be non-Delta.

The value is a number plus the time unit. The following are the valid time units:

- second, seconds
- minute, minutes
- hour, hours
- day, days

You can use the singular or plural unit when defining the value, for example:

- {"pipelines.trigger.interval" : "1 hour"}
- {"pipelines.trigger.interval" : "10 seconds"}
- {"pipelines.trigger.interval" : "30 second"}
- {"pipelines.trigger.interval" : "1 minute"}
- {"pipelines.trigger.interval" : "10 minutes"}
- {"pipelines.trigger.interval" : "10 minute"}

Cluster attributes that are not user settable

Because Delta Live Tables manages cluster lifecycles, many cluster settings are set by Delta Live Tables and cannot be manually configured by users, either in a pipeline configuration or in a cluster policy used by a pipeline. The following table lists these settings and why they cannot be manually set.

Fields
`cluster_name` Delta Live Tables sets the names of the clusters used to run pipeline updates. These names cannot be overridden.
`data_security_mode` `access_mode` These values are automatically set by the system.
`spark_version` Delta Live Tables clusters run on a custom version of Databricks Runtime that is continually updated to include the latest features. The version of Spark is bundled with the Databricks Runtime version and cannot be overridden.
`autotermination_minutes` Because Delta Live Tables manages cluster auto-termination and reuse logic, the cluster auto-termination time cannot be overridden.
`runtime_engine` Although you can control this field by enabling Photon for your pipeline, you cannot set this value directly.
`effective_spark_version` This value is automatically set by the system.
`cluster_source` This field is set by the system and is read-only.
`docker_image` Because Delta Live Tables manages the cluster lifecycle, you cannot use a custom container with pipeline clusters.
`workload_type` This value is set by the system and cannot be overridden.

Jaa

Delta Live Tables properties reference

Delta Live Tables pipeline configurations

Delta Live Tables table properties

Pipelines trigger interval

Cluster attributes that are not user settable

Palaute

Lisäresursseja