Databricks Asset Bundle configuration
This article describes the syntax for Databricks Asset Bundle configuration files, which define Databricks Asset Bundles. See What are Databricks Asset Bundles?
A bundle configuration file must be expressed in YAML format and must contain at minimum the top-level bundle mapping. Each bundle must contain at minimum one (and only one) bundle configuration file named databricks.yml
. If there are multiple bundle configuration files, they must be referenced by the databricks.yml
file using the include
mapping.
For more information about YAML, see the official YAML specification and tutorial.
To create and work with bundle configuration files, see Databricks Asset Bundles development.
Overview
This section provides a visual representation of the bundle configuration file schema. For details, see Mappings.
# This is the default bundle configuration if not otherwise overridden in
# the "targets" top-level mapping.
bundle: # Required.
name: string # Required.
databricks_cli_version: string
cluster_id: string
git:
origin_url: string
branch: string
# These are for any custom variables for use throughout the bundle.
variables:
<some-unique-variable-name>:
description: string
default: string or complex
# These are the default workspace settings if not otherwise overridden in
# the following "targets" top-level mapping.
workspace:
artifact_path: string
auth_type: string
azure_client_id: string # For Azure Databricks only.
azure_environment: string # For Azure Databricks only.
azure_login_app_id: string # For Azure Databricks only. Non-operational and reserved for future use.
azure_tenant_id: string # For Azure Databricks only.
azure_use_msi: true | false # For Azure Databricks only.
azure_workspace_resource_id: string # For Azure Databricks only.
client_id: string # For Databricks on AWS only.
file_path: string
google_service_account: string # For Databricks on Google Cloud only.
host: string
profile: string
root_path: string
state_path: string
# These are the permissions to apply to experiments, jobs, models, and pipelines defined
# in the "resources" mapping.
permissions:
- level: <permission-level>
group_name: <unique-group-name>
- level: <permission-level>
user_name: <unique-user-name>
- level: <permission-level>
service_principal_name: <unique-principal-name>
# These are the default artifact settings if not otherwise overridden in
# the following "targets" top-level mapping.
artifacts:
<some-unique-artifact-identifier>:
build: string
files:
- source: string
path: string
type: string
# These are any additional configuration files to include.
include:
- "<some-file-or-path-glob-to-include>"
- "<another-file-or-path-glob-to-include>"
# This is the identity to use to run the bundle
run_as:
- user_name: <user-name>
- service_principal_name: <service-principal-name>
# These are the default job and pipeline settings if not otherwise overridden in
# the following "targets" top-level mapping.
resources:
dashboards:
<some-unique-programmatic-identifier-for-this-dashboard>:
# See the REST API create request payload reference for dashboards.
experiments:
<some-unique-programmatic-identifier-for-this-experiment>:
# See the REST API create request payload reference for experiments.
jobs:
<some-unique-programmatic-identifier-for-this-job>:
# See REST API create request payload reference for jobs.
models:
<some-unique-programmatic-identifier-for-this-model>:
# See the REST API create request payload reference for models.
pipelines:
<some-unique-programmatic-identifier-for-this-pipeline>:
# See the REST API create request payload reference for Delta Live Tables (pipelines).
schemas:
<some-unique-programmatic-identifier-for-this-schema>:
# See the Unity Catalog schema request payload reference.
# These are any additional files or paths to include or exclude.
sync:
include:
- "<some-file-or-path-glob-to-include>"
- "<another-file-or-path-glob-to-include>"
exclude:
- "<some-file-or-path-glob-to-exclude>"
- "<another-file-or-path-glob-to-exclude>"
paths:
- "<some-file-or-path-to-synchronize>"
# These are the targets to use for deployments and workflow runs. One and only one of these
# targets can be set to "default: true".
targets:
<some-unique-programmatic-identifier-for-this-target>:
artifacts:
# See the preceding "artifacts" syntax.
bundle:
# See the preceding "bundle" syntax.
cluster_id: string
default: true | false
mode: development
presets:
<preset>: <value>
resources:
# See the preceding "resources" syntax.
sync:
# See the preceding "sync" syntax.
variables:
<preceding-unique-variable-name>: <non-default-value>
workspace:
# See the preceding "workspace" syntax.
run_as:
# See the preceding "run_as" syntax.
Examples
Note
For configuration examples that demonstrate bundle features and common bundle use cases, see Bundle configuration examples and the bundle examples repository in GitHub.
The following example bundle configuration specifies a local file named hello.py
that is in the same directory as this local bundle configuration file named databricks.yml
. It runs this notebook as a job using the remote cluster with the specified cluster ID. The remote workspace URL and workspace authentication credentials are read from the caller’s local configuration profile named DEFAULT
.
Databricks recommends that you use the host
mapping instead of the default
mapping wherever possible, as this makes your bundle configuration files more portable. Setting the host
mapping instructs the Databricks CLI to find a matching profile in your .databrickscfg
file and then use that profile’s fields to determine which Databricks authentication type to use. If multiple profiles with a matching host
field exist within your .databrickscfg
file, then you must use the profile
to instruct the Databricks CLI about which specific profile to use. For an example, see the prod
target declaration later in this section.
This technique enables you to reuse as well as to override the job definitions and settings within the resources
block:
bundle:
name: hello-bundle
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
targets:
dev:
default: true
While the following bundle configuration file is functionally equivalent, it is not modularized, which does not enable good reuse. Also, this declaration appends a task to the job rather the overriding the existing job:
bundle:
name: hello-bundle
targets:
dev:
default: true
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
The following example adds a target with the name prod
that uses a different remote workspace URL and workspace authentication credentials, which are read from the caller’s .databrickscfg
file’s matching host
entry with the specified workspace URL. This job runs the same notebook but uses a different remote cluster with the specified cluster ID. Notice that you do not need to declare the notebook_task
mapping within the prod
mapping, as it falls back to use the notebook_task
mapping within the top-level resources
mapping, if the notebook_task
mapping is not explicitly overridden within the prod
mapping.
bundle:
name: hello-bundle
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
targets:
dev:
default: true
prod:
workspace:
host: https://<production-workspace-url>
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 2345-678901-fabcd456
To validate, deploy, and run this job within the dev
target:
# Because the "dev" target is set to "default: true",
# you do not need to specify "-t dev":
databricks bundle validate
databricks bundle deploy
databricks bundle run hello_job
# But you can still explicitly specify it, if you want or need to:
databricks bundle validate
databricks bundle deploy -t dev
databricks bundle run -t dev hello_job
To validate, deploy, and run this job within the prod
target instead:
# You must specify "-t prod", because the "dev" target
# is already set to "default: true":
databricks bundle validate
databricks bundle deploy -t prod
databricks bundle run -t prod hello_job
Following is the previous example but split up into component files for even more modularization and better reuse across multiple bundle configuration files. This technique enables you to not only reuse various definitions and settings, but you can also swap out any of these files with other files that provide completely different declarations:
databricks.yml
:
bundle:
name: hello-bundle
include:
- "bundle*.yml"
bundle.resources.yml
:
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
bundle.targets.yml
:
targets:
dev:
default: true
prod:
workspace:
host: https://<production-workspace-url>
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 2345-678901-fabcd456
Mappings
The following sections describe the bundle configuration file syntax, by top-level mapping.
bundle
A bundle configuration file must contain only one top-level bundle
mapping that associates the bundle’s contents and Azure Databricks workspace settings.
This bundle
mapping must contain a name
mapping that specifies a programmatic (or logical) name for the bundle. The following example declares a bundle with the programmatic (or logical) name hello-bundle
.
bundle:
name: hello-bundle
A bundle
mapping can also be a child of one or more of the targets in the top-level targets mapping. Each of these child bundle
mappings specify any non-default overrides at the target level. However, the top-level bundle
mapping’s name
value cannot be overridden at the target level.
cluster_id
The bundle
mapping can have a child cluster_id
mapping. This mapping enables you to specify the ID of a cluster to use as an override for clusters defined elsewhere in the bundle configuration file. For information about how to retrieve the ID of a cluster, see Cluster URL and ID.
The cluster_id
override is intended for development-only scenarios and is only supported for the target that has its mode
mapping set to development
. For more information about the target
mapping, see targets.
compute_id
Note
This setting is deprecated. Use cluster_id instead.
The bundle
mapping can have a child compute_id
mapping. This mapping enables you to specify the ID of a cluster to use as an override for clusters defined elsewhere in the bundle configuration file.
git
You can retrieve and override Git version control details that are associated with your bundle. This is useful for annotating your deployed resources. For example, you might want to include the origin URL of your repository within the description of a machine learning model that you deploy.
Whenever you run a bundle
command such as validate
, deploy
or run
, the bundle
command populates the command’s configuration tree with the following default settings:
bundle.git.origin_url
, which represents the origin URL of the repo. This is the same value that you would get if you ran the commandgit config --get remote.origin.url
from your cloned repo. You can use substitutions to refer to this value with your bundle configuration files, as${bundle.git.origin_url}
.bundle.git.branch
, which represents the current branch within the repo. This is the same value that you would get if you ran the commandgit branch --show-current
from your cloned repo. You can use substitutions to refer to this value with your bundle configuration files, as${bundle.git.branch}
.bundle.git.commit
, which represents theHEAD
commit within the repo. This is the same value that you would get if you ran the commandgit rev-parse HEAD
from your cloned repo. You can use substitutions to refer to this value with your bundle configuration files, as${bundle.git.commit}
.
To retrieve or override Git settings, your bundle must be within a directory that is associated with a Git repository, for example a local directory that is initialized by running the git clone
command. If the directory is not associated with a Git repository, these Git settings are empty.
You can override the origin_url
and branch
settings within the git
mapping of your top-level bundle
mapping if needed, as follows:
bundle:
git:
origin_url: <some-non-default-origin-url>
branch: <some-non-current-branch-name>
databricks_cli_version
The bundle
mapping can contain a databricks_cli_version
mapping that constrains the Databricks CLI version required by the bundle. This can prevent issues caused by using mappings that are not supported in a certain version of the Databricks CLI.
The Databricks CLI version conforms to semantic versioning and the databricks_cli_version
mapping supports specifying version constraints. If the current databricks --version
value is not within the bounds specified in the bundle’s databricks_cli_version
mapping, an error occurs when databricks bundle validate
is executed on the bundle. The following examples demonstrate some common version constraint syntax:
bundle:
name: hello-bundle
databricks_cli_version: "0.218.0" # require Databricks CLI 0.218.0
bundle:
name: hello-bundle
databricks_cli_version: "0.218.*" # allow all patch versions of Databricks CLI 0.218
bundle:
name: my-bundle
databricks_cli_version: ">= 0.218.0" # allow any version of Databricks CLI 0.218.0 or higher
bundle:
name: my-bundle
databricks_cli_version: ">= 0.218.0, <= 1.0.0" # allow any Databricks CLI version between 0.218.0 and 1.0.0, inclusive
variables
The bundles settings file can contain one top-level variables
mapping where custom variables are defined. For each variable, set an optional description, default value, whether the custom variable is a complex type, or lookup to retrieve an ID value, using the following format:
variables:
<variable-name>:
description: <variable-description>
default: <optional-default-value>
type: <optional-type-value> # "complex" is the only valid value
lookup:
<optional-object-type>: <optional-object-name>
Note
Variables are assumed to be of type string
, unless type
is set to complex
. See Define a complex variable.
To reference a custom variable within bundle configuration, use the substitution ${var.<variable_name>}
.
For more information on custom variables and substitutions, see Substitutions and variables in Databricks Asset Bundles.
workspace
The bundle configuration file can contain only one top-level workspace
mapping to specify any non-default Azure Databricks workspace settings to use.
Important
Valid Databricks workspace paths begin with either /Workspace
or /Volumes
. Custom workspace paths are automatically prefixed with /Workspace
, so if you use any workspace path substitution in your custom path such as ${workspace.file_path}
, you do not need to prepend /Workspace
to the path.
root_path
This workspace
mapping can contain a root_path
mapping to specify a non-default root path to use within the workspace for both deployments and workflow runs, for example:
workspace:
root_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}
By default, for root_path
the Databricks CLI uses the default path of /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}
, which uses substitutions.
artifact_path
This workspace
mapping can also contain an artifact_path
mapping to specify a non-default artifact path to use within the workspace for both deployments and workflow runs, for example:
workspace:
artifact_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}/artifacts
By default, for artifact_path
the Databricks CLI uses the default path of ${workspace.root}/artifacts
, which uses substitutions.
Note
The artifact_path
mapping does not support Databricks File System (DBFS) paths.
file_path
This workspace
mapping can also contain a file_path
mapping to specify a non-default file path to use within the workspace for both deployments and workflow runs, for example:
workspace:
file_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}/files
By default, for file_path
the Databricks CLI uses the default path of ${workspace.root}/files
, which uses substitutions.
state_path
The state_path
mapping defaults to the default path of ${workspace.root}/state
and represents the path within your workspace to store Terraform state information about deployments.
Other workspace mappings
The workspace
mapping can also contain the following optional mappings to specify the Azure Databricks authentication mechanism to use. If they are not specified within this workspace
mapping, they must be specified in a workspace
mapping as a child of one or more of the targets in the top-level targets mapping.
Important
You must hard-code values for the following workspace
mappings for Azure Databricks authentication. For instance, you cannot specify custom variables for these mappings’ values by using the ${var.*}
syntax.
The
profile
mapping, (or the--profile
or-p
options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI) specifies the name of a configuration profile to use with this workspace for Azure Databricks authentication. This configuration profile maps to the one that you created when you set up the Databricks CLI.Note
Databricks recommends that you use the
host
mapping (or the--profile
or-p
options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI) instead of theprofile
mapping, as this makes your bundle configuration files more portable. Setting thehost
mapping instructs the Databricks CLI to find a matching profile in your.databrickscfg
file and then use that profile’s fields to determine which Databricks authentication type to use. If multiple profiles with a matchinghost
field exist within your.databrickscfg
file, then you must use theprofile
mapping (or the--profile
or-p
command-line options) to instruct the Databricks CLI about which profile to use. For an example, see theprod
target declaration in the examples.The
host
mapping specifies the URL for your Azure Databricks workspace. See Per-workspace URL.For OAuth machine-to-machine (M2M) authentication, the mapping
client_id
is used. Alternatively, you can set this value in the local environment variableDATABRICKS_CLIENT_ID
. Or you can create a configuration profile with theclient_id
value and then specify the profile’s name with theprofile
mapping (or by using the--profile
or-p
options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI). See Authenticate access to Azure Databricks with a service principal using OAuth (OAuth M2M).Note
You cannot specify an Azure Databricks OAuth secret value in the bundle configuration file. Instead, set the local environment variable
DATABRICKS_CLIENT_SECRET
. Or you can add theclient_secret
value to a configuration profile and then specify the profile’s name with theprofile
mapping (or by using the--profile
or-p
options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI).For Azure CLI authentication, the mapping
azure_workspace_resource_id
is used. Alternatively, you can set this value in the local environment variableDATABRICKS_AZURE_RESOURCE_ID
. Or you can create a configuration profile with theazure_workspace_resource_id
value and then specify the profile’s name with theprofile
mapping (or by using the--profile
or-p
options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI). See Azure CLI authentication.For Azure client secret authentication with service principals, the mappings
azure_workspace_resource_id
,azure_tenant_id
, andazure_client_id
are used. Alternatively, you can set these values in the local environment variablesDATABRICKS_AZURE_RESOURCE_ID
,ARM_TENANT_ID
, andARM_CLIENT_ID
, respectively. Or you can create a configuration profile with theazure_workspace_resource_id
,azure_tenant_id
, andazure_client_id
values and then specify the profile’s name with theprofile
mapping (or by using the--profile
or-p
options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI). See MS Entra service principal authentication.Note
You cannot specify an Azure client secret value in the bundle configuration file. Instead, set the local environment variable
ARM_CLIENT_SECRET
. Or you can add theazure_client_secret
value to a configuration profile and then specify the profile’s name with theprofile
mapping (or by using the--profile
or-p
options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI).For Azure managed identities authentication, the mappings
azure_use_msi
,azure_client_id
, andazure_workspace_resource_id
are used. Alternatively, you can set these values in the local environment variablesARM_USE_MSI
,ARM_CLIENT_ID
, andDATABRICKS_AZURE_RESOURCE_ID
, respectively. Or you can create a configuration profile with theazure_use_msi
,azure_client_id
, andazure_workspace_resource_id
values and then specify the profile’s name with theprofile
mapping (or by using the--profile
or-p
options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI). See Azure managed identities authentication.The
azure_environment
mapping specifies the Azure environment type (such as Public, UsGov, China, and Germany) for a specific set of API endpoints. The default value isPUBLIC
. Alternatively, you can set this value in the local environment variableARM_ENVIRONMENT
. Or you can add theazure_environment
value to a configuration profile and then specify the profile’s name with theprofile
mapping (or by using the--profile
or-p
options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI).The
azure_login_app_id
mapping is non-operational and is reserved for internal use.The
auth_type
mapping specifies the Azure Databricks authentication type to use, especially in cases where the Databricks CLI infers an unexpected authentication type. See the Authenticate access to Azure Databricks resources.
permissions
The top-level permissions
mapping specifies one or more permission levels to apply to all resources defined in the bundle. If you want to apply permissions to a specific resource, see Define permissions for a specific resource.
Allowed top-level permission levels are CAN_VIEW
, CAN_MANAGE
, and CAN_RUN
.
The following example in a bundle configuration file defines permission levels for a user, group, and service principal, which are applied to all jobs, pipelines, experiments, and models defined in resources
in the bundle:
permissions:
- level: CAN_VIEW
group_name: test-group
- level: CAN_MANAGE
user_name: someone@example.com
- level: CAN_RUN
service_principal_name: 123456-abcdef
artifacts
The top-level artifacts
mapping specifies one or more artifacts that are automatically built during bundle deployments and can be used later in bundle runs. Each child artifact supports the following mappings:
type
is required. To build a Python wheel file before deploying, this mapping must be set towhl
.path
is an optional, relative path from the location of the bundle configuration file to the location of the Python wheel file’ssetup.py
file. Ifpath
is not included, the Databricks CLI will attempt to find the Python wheel file’ssetup.py
file in the bundle’s root.files
is an optional mapping that includes a childsource
mapping, which you can use to specify non-default locations to include for complex build instructions. Locations are specified as relative paths from the location of the bundle configuration file.build
is an optional set of non-default build commands that you want to run locally before deployment. For Python wheel builds, the Databricks CLI assumes that it can find a local install of the Pythonwheel
package to run builds, and it runs the commandpython setup.py bdist_wheel
by default during each bundle deployment. To specify multiple build commands, separate each command with double-ampersand (&&
) characters.
For more information, including a sample bundle that uses artifacts
, see Develop a Python wheel file using Databricks Asset Bundles.
Tip
You can define, combine, and override the settings for artifacts in bundles by using the techniques described in Define artifact settings dynamically in Databricks Asset Bundles.
include
The include
array specifies a list of path globs that contain configuration files to include within the bundle. These path globs are relative to the location of the bundle configuration file in which the path globs are specified.
The Databricks CLI does not include any configuration files by default within the bundle. You must use the include
array to specify any and all configuration files to include within the bundle, other than the databricks.yml
file itself.
This include
array can appear only as a top-level mapping.
The following example configuration includes three configuration files. These files are in the same folder as the bundle configuration file:
include:
- "bundle.artifacts.yml"
- "bundle.resources.yml"
- "bundle.targets.yml"
The following example configuration includes all files with filenames that begin with bundle
and end with .yml
. These files are in the same folder as the bundle configuration file:
include:
- "bundle*.yml"
resources
The resources
mapping specifies information about the Azure Databricks resources used by the bundle.
This resources
mapping can appear as a top-level mapping, or it can be a child of one or more of the targets in the top-level targets mapping, and includes zero or one of the supported resource types. Each resource type mapping includes one or more individual resource declarations, which must each have a unique name. These individual resource declarations use the corresponding object’s create operation’s request payload, expressed in YAML, to define the resource. Supported properties for a resource are the corresponding object’s supported fields.
Create operation request payloads are documented in the Databricks REST API Reference, and the databricks bundle schema
command outputs all supported object schemas. In addition, the databricks bundle validate
command returns warnings if unknown resource properties are found in bundle configuration files.
The following table lists supported resource types for bundles and links to documentation on their corresponding payloads.
Resource type | Resource mappings |
---|---|
cluster | Cluster mappings: POST /api/2.1/clusters/create |
dashboard | Dashboard mappings: POST /api/2.0/preview/sql/dashboards |
experiment | Experiment mappings: POST /api/2.0/mlflow/experiments/create |
job | Job mappings: POST /api/2.1/jobs/create For additional information, see job task types and override new job cluster settings. |
pipeline | Pipeline mappings: POST /api/2.0/pipelines |
model | Model mappings: POST /api/2.0/mlflow/registered-models/create |
model_serving_endpoint | Model serving endpoint mappings: POST /api/2.0/serving-endpoints |
registered_model (Unity Catalog) | Unity Catalog model mappings: POST /api/2.1/unity-catalog/models |
schema (Unity Catalog) | Unity Catalog schema mappings: POST /api/2.1/unity-catalog/schemas |
All paths to folders and files referenced by resource declarations are relative to the location of the bundle configuration file in which these paths are specified.
cluster
The cluster resource allows you to create all-purpose clusters. The following example creates a cluster named my_cluster
and sets that as the cluster to use to run the notebook in my_job
:
bundle:
name: clusters
resources:
clusters:
my_cluster:
num_workers: 2
node_type_id: "i3.xlarge"
autoscale:
min_workers: 2
max_workers: 7
spark_version: "13.3.x-scala2.12"
spark_conf:
"spark.executor.memory": "2g"
jobs:
my_job:
tasks:
- task_key: test_task
existing_cluster_id: ${resources.clusters.my_cluster.id}
notebook_task:
notebook_path: "./src/my_notebook.py"
dashboard
The dashboard resource allows you to manage AI/BI dashboards in a bundle. For information about AI/BI dashboards, see Dashboards.
The following example includes and deploys the sample NYC Taxi Trip Analysis dashboard to the Databricks workspace.
resources:
dashboards:
nyc_taxi_trip_analysis:
display_name: "NYC Taxi Trip Analysis"
file_path: ../src/nyc_taxi_trip_analysis.lvdash.json
warehouse_id: ${var.warehouse_id}
If you use the UI to modify the dashboard, modifications made through the UI are not applied to the dashboard JSON file in the local bundle unless you explicitly update it using bundle generate
. You can use the --watch
option to continuously poll and retrieve changes to the dashboard. See Generate a bundle configuration file.
In addition, if you attempt to deploy a bundle that contains a dashboard JSON file that is different than the one in the remote workspace, an error will occur. To force the deploy and overwrite the dashboard in the remote workspace with the local one, use the --force
option. See Deploy a bundle.
job
The following example declares a job with the resource key of hello-job
:
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
pipeline
The following example declares a pipeline with the resource key of hello-pipeline
:
resources:
pipelines:
hello-pipeline:
name: hello-pipeline
clusters:
- label: default
num_workers: 1
development: true
continuous: false
channel: CURRENT
edition: CORE
photon: false
libraries:
- notebook:
path: ./pipeline.py
schema
The schema resource type allows you to define Unity Catalog schemas for tables and other assets in your workflows and pipelines created as part of a bundle. A schema, different from other resource types, has the following limitations:
- The owner of a schema resource is always the deployment user, and cannot be changed. If
run_as
is specified in the bundle, it will be ignored by operations on the schema. - Only fields supported by the corresponding Schemas object create API are available for the
schema
resource. For example,enable_predictive_optimization
is not supported as it is only available on the update API.
The following example declares a pipeline with the resource key my_pipeline
that creates a Unity Catalog schema with the key my_schema
as the target:
resources:
pipelines:
my_pipeline:
name: test-pipeline-{{.unique_id}}
libraries:
- notebook:
path: ./nb.sql
development: true
catalog: main
target: ${resources.schemas.my_schema.id}
schemas:
my_schema:
name: test-schema-{{.unique_id}}
catalog_name: main
comment: This schema was created by DABs.
A top-level grants mapping is not supported by Databricks Asset Bundles so if you want to set grants for a schema, define the grants for the schema within the schemas
mapping:
schemas:
my_schema:
name: test-schema
grants:
- principal: users
privileges:
- CAN_MANAGE
- principal: my_team
privileges:
- CAN_READ
catalog_name: main
comment: "my schema with grants"
sync
The sync
mapping allows you to configure which files are part of your bundle deployments.
include and exclude
The include
and exclude
mappings within the sync
mapping specifies a list of file or folders to include within, or exclude from, bundle deployments, depending on the following rules:
- Based on any list of file and path globs in a
.gitignore
file in the bundle’s root, theinclude
mapping can contain a list of file globs, path globs, or both, relative to the bundle’s root, to explicitly include. - Based on any list of file and path globs in a
.gitignore
file in the bundle’s root, plus the list of file and path globs in theinclude
mapping, theexclude
mapping can contain a list of file globs, path globs, or both, relative to the bundle’s root, to explicitly exclude.
All paths to specified files and folders are relative to the location of the bundle configuration file in which they are specified.
The syntax for include
and exclude
file and path patterns follow standard .gitignore
pattern syntax. See gitignore Pattern Format.
For example, if the following .gitignore
file contains the following entries:
.databricks
my_package/dist
And the bundle configuration file contains the following include
mapping:
sync:
include:
- my_package/dist/*.whl
Then all of the files in the my_package/dist
folder with a file extension of *.whl
are included. Any other files in the my_package/dist
folder are not included.
However, if the bundle configuration file also contains the following exclude
mapping:
sync:
include:
- my_package/dist/*.whl
exclude:
- my_package/dist/delete-me.whl
Then all of the files in the my_package/dist
folder with a file extension of *.whl
, except for the file named delete-me.whl
, are included. Any other files in the my_package/dist
folder are also not included.
The sync
mapping can also be declared in the targets
mapping for a specific target. Any sync
mapping declared in a target is merged with any top-level sync
mapping declarations. For example, continuing with the preceding example, the following include
mapping at the targets
level merges with the include
mapping in the top-level sync
mapping:
targets:
dev:
sync:
include:
- my_package/dist/delete-me.whl
paths
The sync
mapping can contain a paths
mapping that specifies local paths to synchronize to the workspace. The paths
mapping allows you to share common files across bundles, and can be used to sync files located outside of the bundle root. (The bundle root is the location of the databricks.yml file.) This is especially useful when you have a single repository that hosts multiple bundles and want to share libraries, code files, or configuration.
Specified paths must be relative to files and directories anchored at the folder where the paths
mapping is set. If one or more path values traverse up the directory to an ancestor of the bundle root, the root path is dynamically determined to ensure that the folder structure remains intact. For example, if the bundle root folder is named my_bundle
then this configuration in databricks.yml
syncs the common
folder located one level above the bundle root and the bundle root itself:
sync:
paths:
- ../common
- .
A deploy of this bundle results in the following folder structure in the workspace:
common/
common_file.txt
my_bundle/
databricks.yml
src/
...
targets
The targets
mapping specifies one or more contexts in which to run Azure Databricks workflows. Each target is a unique collection of artifacts, Azure Databricks workspace settings, and Azure Databricks job or pipeline details.
The targets
mapping consists of one or more target mappings, which must each have a unique programmatic (or logical) name.
This targets
mapping is optional but highly recommended. If it is specified, it can appear only as a top-level mapping.
The settings in the top-level workspace, artifacts, and resources mappings are used if they are not specified in a targets
mapping, but any conflicting settings are overridden by the settings within a target.
A target can also override the values of any top-level variables.
default
To specify a target default for bundle commands, set the default
mapping to true
. For example, this target named dev
is the default target:
targets:
dev:
default: true
If a default target is not configured, or if you want to validate, deploy, and run jobs or pipelines within a specific target, use the -t
option of the bundle commands.
The following commands validate, deploy, and run my_job
within the dev
and prod
targets:
databricks bundle validate
databricks bundle deploy -t dev
databricks bundle run -t dev my_job
databricks bundle validate
databricks bundle deploy -t prod
databricks bundle run -t prod my_job
The following example declares two targets. The first target has the name dev
and is the default target used when no target is specified for bundle commands. The second target has the name prod
and is used only when this target is specified for bundle commands.
targets:
dev:
default: true
prod:
workspace:
host: https://<production-workspace-url>
mode and presets
To facilitate easy development and CI/CD best practices, Databricks Asset Bundles provides deployment modes for targets that set default behaviors for pre-production and production workflows. Some behaviors are also configurable. For details, see Databricks Asset Bundle deployment modes.
Tip
To set run identities for bundles, you can specify run_as
for each target, as described in Specify a run identity for a Databricks Asset Bundles workflow.
To specify that a target is treated as a development target, add the mode
mapping set to development
. To specify that a target is treated as a production target, add the mode
mapping set to production
. For example, this target named prod
is treated as a production target:
targets:
prod:
mode: production
You can customize some of the behaviors using the presets
mapping. For a list of available presets, see Custom presets. The following example shows a customized production target that prefixes and tags all production resources:
targets:
prod:
mode: production
presets:
name_prefix: "production_" # prefix all resource names with production_
tags:
prod: true
If both mode
and presets
are set, presets override the default mode behavior. Settings of individual resources override the presets. For example, if a schedule is set to UNPAUSED
, but the trigger_pause_status
preset is set to PAUSED
, the schedule will be unpaused.