Configure compute for a Delta Live Tables pipeline
This article contains instructions and considerations when configuring custom compute settings for Delta Live Tables pipelines.
Serverless pipelines do not provide compute configuration options. See Configure a serverless Delta Live Tables pipeline.
Users must have permission to deploy compute to configure and update Delta Live Tables pipelines. Workspace admins can configure cluster policies to provide users with access to compute resources for Delta Live Tables. See Define limits on Delta Live Tables pipeline compute.
Note
Cluster policies are optional. Check with your workspace administrator if you lack the compute privileges required for Delta Live Tables.
To ensure that cluster policy default values are correctly applied, set
apply_policy_default_values
totrue
in the cluster configurations in your pipeline configuration:{ "clusters": [ { "label": "default", "policy_id": "<policy-id>", "apply_policy_default_values": true } ] }
You can use cluster tags to monitor usage for your pipeline clusters. Add cluster tags in the Delta Live Tables UI when you create or edit a pipeline or by editing the JSON settings for your pipeline clusters.
By default, Delta Live Tables selects the instance types for your pipeline’s driver and worker nodes. You can optionally configure the instance types.
For example, select instance types to improve pipeline performance or address memory issues when running your pipeline. You can configure instance types when you create or edit a pipeline with the REST API, or in the Delta Live Tables UI.
To configure instance types when you create or edit a pipeline in the Delta Live Tables UI:
- Click the Settings button.
- In the Advanced section of the pipeline settings, in the Worker type and Driver type drop-down menus, select the instance types for the pipeline.
Note
Because compute resources are fully managed for serverless DLT pipelines, compute settings are unavailable when you select Serverless for a pipeline.
Each Delta Live Tables pipeline has two associated clusters:
- The
updates
cluster processes pipeline updates. - The
maintenance
cluster runs daily maintenance tasks.
Compute settings specified using the workspace pipeline configuration UI apply to both update and maintenance clusters. You must edit the JSON configuration to modify these settings independently.
The configuration these clusters use is determined by the clusters
attribute specified in your pipeline settings.
Using cluster labels, you can add compute settings that apply to only a specific cluster type. There are three labels you can use when configuring pipeline clusters:
Note
The cluster label setting can be omitted if you define only one cluster configuration. The default
label is applied to cluster configurations if no setting for the label is provided. The cluster label setting is required only if you need to customize settings for different cluster types.
- The
default
label defines compute settings for both theupdates
andmaintenance
clusters. Applying the same settings to both clusters improves the reliability of maintenance runs by ensuring that required configurations such as data access credentials for a storage location are applied to the maintenance cluster. - The
maintenance
label defines compute settings that apply to only themaintenance
cluster. You can also use themaintenance
label to override settings configured by thedefault
label. - The
updates
label defines settings that apply to only theupdates
cluster. Use it to configure settings that should not be applied to themaintenance
cluster.
Settings defined using the default
and updates
labels are merged to create the final configuration for the updates
cluster. If the same setting is defined using both default
and updates
labels, the setting defined with the updates
label overrides the setting defined with the default
label.
The following example defines a Spark configuration parameter that is added only to the configuration for the updates
cluster:
{
"clusters": [
{
"label": "default",
"autoscale": {
"min_workers": 1,
"max_workers": 5,
"mode": "ENHANCED"
}
},
{
"label": "updates",
"spark_conf": {
"key": "value"
}
}
]
}
Delta Live Tables has similar options for cluster settings as other compute on Azure Databricks. Like other pipeline settings, you can modify the JSON configuration for clusters to specify options not present in the UI. See Compute.
Note
Because the Delta Live Tables runtime manages the lifecycle of pipeline clusters and runs a custom version of Databricks Runtime, you cannot manually set some cluster settings in a pipeline configuration, such as the Spark version or cluster names. See Cluster attributes that are not user settable.
To configure instance types in the pipeline’s JSON settings, click the JSON button and enter the instance type configurations in the cluster configuration:
Note
To avoid assigning unnecessary resources to the maintenance
cluster, this example uses the updates
label to set the instance types for only the updates
cluster. To assign the instance types to both updates
and maintenance
clusters, use the default
label or omit the setting for the label. The default
label is applied to pipeline cluster configurations if no setting for the label is provided. See Advanced compute configurations.
{
"clusters": [
{
"label": "updates",
"node_type_id": "Standard_D12_v2",
"driver_node_type_id": "Standard_D3_v2",
"..." : "..."
}
]
}
To control cluster shutdown behavior, you can use development or production mode or use the pipelines.clusterShutdown.delay
setting in the pipeline configuration. The following example sets the pipelines.clusterShutdown.delay
value to 60 seconds:
{
"configuration": {
"pipelines.clusterShutdown.delay": "60s"
}
}
When production
mode is enabled, the default value for pipelines.clusterShutdown.delay
is 0 seconds
. When development
mode is enabled, the default value is 2 hours
.
Note
Because a Delta Live Tables cluster automatically shuts down when not in use, referencing a cluster policy that sets autotermination_minutes
in your cluster configuration results in error.
If you set num_workers
to 0 in cluster settings, the cluster is created as a Single Node cluster. Configuring an autoscaling cluster and setting min_workers
to 0 and max_workers
to 0 creates a Single Node cluster.
If you configure an autoscaling cluster and set only min_workers
to 0, the cluster is not created as a Single Node cluster. The cluster has at least one active worker at all times until terminated.
An example cluster configuration to create a Single Node cluster in Delta Live Tables:
{
"clusters": [
{
"num_workers": 0
}
]
}