Del via


Databricks Asset Bundle configuration

This article describes the syntax for Databricks Asset Bundle configuration files, which define Databricks Asset Bundles. See What are Databricks Asset Bundles?

A bundle configuration file must be expressed in YAML format and must contain at minimum the top-level bundle mapping. Each bundle must contain at minimum one (and only one) bundle configuration file named databricks.yml. If there are multiple bundle configuration files, they must be referenced by the databricks.yml file using the include mapping.

For more information about YAML, see the official YAML specification and tutorial.

To create and work with bundle configuration files, see Databricks Asset Bundles development.

Overview

This section provides a visual representation of the bundle configuration file schema. For details, see Mappings.

# This is the default bundle configuration if not otherwise overridden in
# the "targets" top-level mapping.
bundle: # Required.
  name: string # Required.
  databricks_cli_version: string
  cluster_id: string
  git:
    origin_url: string
    branch: string

# These are for any custom variables for use throughout the bundle.
variables:
  <some-unique-variable-name>:
    description: string
    default: string or complex

# These are the default workspace settings if not otherwise overridden in
# the following "targets" top-level mapping.
workspace:
  artifact_path: string
  auth_type: string
  azure_client_id: string # For Azure Databricks only.
  azure_environment: string # For Azure Databricks only.
  azure_login_app_id: string # For Azure Databricks only. Non-operational and reserved for future use.
  azure_tenant_id: string # For Azure Databricks only.
  azure_use_msi: true | false # For Azure Databricks only.
  azure_workspace_resource_id: string # For Azure Databricks only.
  client_id: string # For Databricks on AWS only.
  file_path: string
  google_service_account: string # For Databricks on Google Cloud only.
  host: string
  profile: string
  root_path: string
  state_path: string

# These are the permissions to apply to experiments, jobs, models, and pipelines defined
# in the "resources" mapping.
permissions:
  - level: <permission-level>
    group_name: <unique-group-name>
  - level: <permission-level>
    user_name: <unique-user-name>
  - level: <permission-level>
    service_principal_name: <unique-principal-name>

# These are the default artifact settings if not otherwise overridden in
# the following "targets" top-level mapping.
artifacts:
  <some-unique-artifact-identifier>:
    build: string
    files:
      - source: string
    path: string
    type: string

# These are any additional configuration files to include.
include:
  - "<some-file-or-path-glob-to-include>"
  - "<another-file-or-path-glob-to-include>"

# This is the identity to use to run the bundle
run_as:
  - user_name: <user-name>
  - service_principal_name: <service-principal-name>

# These are the default job and pipeline settings if not otherwise overridden in
# the following "targets" top-level mapping.
resources:
  dashboards:
    <some-unique-programmatic-identifier-for-this-dashboard>:
      # See the REST API create request payload reference for dashboards.
  experiments:
    <some-unique-programmatic-identifier-for-this-experiment>:
      # See the REST API create request payload reference for experiments.
  jobs:
    <some-unique-programmatic-identifier-for-this-job>:
      # See REST API create request payload reference for jobs.
  models:
    <some-unique-programmatic-identifier-for-this-model>:
      # See the REST API create request payload reference for models.
  pipelines:
    <some-unique-programmatic-identifier-for-this-pipeline>:
      # See the REST API create request payload reference for Delta Live Tables (pipelines).
  schemas:
    <some-unique-programmatic-identifier-for-this-schema>:
      # See the Unity Catalog schema request payload reference.

# These are any additional files or paths to include or exclude.
sync:
  include:
    - "<some-file-or-path-glob-to-include>"
    - "<another-file-or-path-glob-to-include>"
  exclude:
    - "<some-file-or-path-glob-to-exclude>"
    - "<another-file-or-path-glob-to-exclude>"
  paths:
    - "<some-file-or-path-to-synchronize>"

# These are the targets to use for deployments and workflow runs. One and only one of these
# targets can be set to "default: true".
targets:
  <some-unique-programmatic-identifier-for-this-target>:
    artifacts:
      # See the preceding "artifacts" syntax.
    bundle:
      # See the preceding "bundle" syntax.
    cluster_id: string
    default: true | false
    mode: development
    presets:
      <preset>: <value>
    resources:
      # See the preceding "resources" syntax.
    sync:
      # See the preceding "sync" syntax.
    variables:
      <preceding-unique-variable-name>: <non-default-value>
    workspace:
      # See the preceding "workspace" syntax.
    run_as:
      # See the preceding "run_as" syntax.

Examples

Note

For configuration examples that demonstrate bundle features and common bundle use cases, see Bundle configuration examples and the bundle examples repository in GitHub.

The following example bundle configuration specifies a local file named hello.py that is in the same directory as this local bundle configuration file named databricks.yml. It runs this notebook as a job using the remote cluster with the specified cluster ID. The remote workspace URL and workspace authentication credentials are read from the caller’s local configuration profile named DEFAULT.

Databricks recommends that you use the host mapping instead of the default mapping wherever possible, as this makes your bundle configuration files more portable. Setting the host mapping instructs the Databricks CLI to find a matching profile in your .databrickscfg file and then use that profile’s fields to determine which Databricks authentication type to use. If multiple profiles with a matching host field exist within your .databrickscfg file, then you must use the profile to instruct the Databricks CLI about which specific profile to use. For an example, see the prod target declaration later in this section.

This technique enables you to reuse as well as to override the job definitions and settings within the resources block:

bundle:
  name: hello-bundle

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          existing_cluster_id: 1234-567890-abcde123
          notebook_task:
            notebook_path: ./hello.py

targets:
  dev:
    default: true

While the following bundle configuration file is functionally equivalent, it is not modularized, which does not enable good reuse. Also, this declaration appends a task to the job rather the overriding the existing job:

bundle:
  name: hello-bundle

targets:
  dev:
    default: true
    resources:
      jobs:
        hello-job:
          name: hello-job
          tasks:
            - task_key: hello-task
              existing_cluster_id: 1234-567890-abcde123
              notebook_task:
                notebook_path: ./hello.py

The following example adds a target with the name prod that uses a different remote workspace URL and workspace authentication credentials, which are read from the caller’s .databrickscfg file’s matching host entry with the specified workspace URL. This job runs the same notebook but uses a different remote cluster with the specified cluster ID. Notice that you do not need to declare the notebook_task mapping within the prod mapping, as it falls back to use the notebook_task mapping within the top-level resources mapping, if the notebook_task mapping is not explicitly overridden within the prod mapping.

bundle:
  name: hello-bundle

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          existing_cluster_id: 1234-567890-abcde123
          notebook_task:
            notebook_path: ./hello.py

targets:
  dev:
    default: true
  prod:
    workspace:
      host: https://<production-workspace-url>
    resources:
      jobs:
        hello-job:
          name: hello-job
          tasks:
            - task_key: hello-task
              existing_cluster_id: 2345-678901-fabcd456

To validate, deploy, and run this job within the dev target:

# Because the "dev" target is set to "default: true",
# you do not need to specify "-t dev":
databricks bundle validate
databricks bundle deploy
databricks bundle run hello_job

# But you can still explicitly specify it, if you want or need to:
databricks bundle validate
databricks bundle deploy -t dev
databricks bundle run -t dev hello_job

To validate, deploy, and run this job within the prod target instead:

# You must specify "-t prod", because the "dev" target
# is already set to "default: true":
databricks bundle validate
databricks bundle deploy -t prod
databricks bundle run -t prod hello_job

Following is the previous example but split up into component files for even more modularization and better reuse across multiple bundle configuration files. This technique enables you to not only reuse various definitions and settings, but you can also swap out any of these files with other files that provide completely different declarations:

databricks.yml:

bundle:
  name: hello-bundle

include:
  - "bundle*.yml"

bundle.resources.yml:

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          existing_cluster_id: 1234-567890-abcde123
          notebook_task:
            notebook_path: ./hello.py

bundle.targets.yml:

targets:
  dev:
    default: true
  prod:
    workspace:
      host: https://<production-workspace-url>
    resources:
      jobs:
        hello-job:
          name: hello-job
          tasks:
            - task_key: hello-task
              existing_cluster_id: 2345-678901-fabcd456

Mappings

The following sections describe the bundle configuration file syntax, by top-level mapping.

bundle

A bundle configuration file must contain only one top-level bundle mapping that associates the bundle’s contents and Azure Databricks workspace settings.

This bundle mapping must contain a name mapping that specifies a programmatic (or logical) name for the bundle. The following example declares a bundle with the programmatic (or logical) name hello-bundle.

bundle:
  name: hello-bundle

A bundle mapping can also be a child of one or more of the targets in the top-level targets mapping. Each of these child bundle mappings specify any non-default overrides at the target level. However, the top-level bundle mapping’s name value cannot be overridden at the target level.

cluster_id

The bundle mapping can have a child cluster_id mapping. This mapping enables you to specify the ID of a cluster to use as an override for clusters defined elsewhere in the bundle configuration file. For information about how to retrieve the ID of a cluster, see Cluster URL and ID.

The cluster_id override is intended for development-only scenarios and is only supported for the target that has its mode mapping set to development. For more information about the target mapping, see targets.

compute_id

Note

This setting is deprecated. Use cluster_id instead.

The bundle mapping can have a child compute_id mapping. This mapping enables you to specify the ID of a cluster to use as an override for clusters defined elsewhere in the bundle configuration file.

git

You can retrieve and override Git version control details that are associated with your bundle. This is useful for annotating your deployed resources. For example, you might want to include the origin URL of your repository within the description of a machine learning model that you deploy.

Whenever you run a bundle command such as validate, deploy or run, the bundle command populates the command’s configuration tree with the following default settings:

  • bundle.git.origin_url, which represents the origin URL of the repo. This is the same value that you would get if you ran the command git config --get remote.origin.url from your cloned repo. You can use substitutions to refer to this value with your bundle configuration files, as ${bundle.git.origin_url}.
  • bundle.git.branch, which represents the current branch within the repo. This is the same value that you would get if you ran the command git branch --show-current from your cloned repo. You can use substitutions to refer to this value with your bundle configuration files, as ${bundle.git.branch}.
  • bundle.git.commit, which represents the HEAD commit within the repo. This is the same value that you would get if you ran the command git rev-parse HEAD from your cloned repo. You can use substitutions to refer to this value with your bundle configuration files, as ${bundle.git.commit}.

To retrieve or override Git settings, your bundle must be within a directory that is associated with a Git repository, for example a local directory that is initialized by running the git clone command. If the directory is not associated with a Git repository, these Git settings are empty.

You can override the origin_url and branch settings within the git mapping of your top-level bundle mapping if needed, as follows:

bundle:
  git:
    origin_url: <some-non-default-origin-url>
    branch: <some-non-current-branch-name>

databricks_cli_version

The bundle mapping can contain a databricks_cli_version mapping that constrains the Databricks CLI version required by the bundle. This can prevent issues caused by using mappings that are not supported in a certain version of the Databricks CLI.

The Databricks CLI version conforms to semantic versioning and the databricks_cli_version mapping supports specifying version constraints. If the current databricks --version value is not within the bounds specified in the bundle’s databricks_cli_version mapping, an error occurs when databricks bundle validate is executed on the bundle. The following examples demonstrate some common version constraint syntax:

bundle:
  name: hello-bundle
  databricks_cli_version: "0.218.0" # require Databricks CLI 0.218.0
bundle:
  name: hello-bundle
  databricks_cli_version: "0.218.*" # allow all patch versions of Databricks CLI 0.218
bundle:
  name: my-bundle
  databricks_cli_version: ">= 0.218.0" # allow any version of Databricks CLI 0.218.0 or higher
bundle:
  name: my-bundle
  databricks_cli_version: ">= 0.218.0, <= 1.0.0" # allow any Databricks CLI version between 0.218.0 and 1.0.0, inclusive

variables

The bundles settings file can contain one top-level variables mapping where custom variables are defined. For each variable, set an optional description, default value, whether the custom variable is a complex type, or lookup to retrieve an ID value, using the following format:

variables:
  <variable-name>:
    description: <variable-description>
    default: <optional-default-value>
    type: <optional-type-value> # "complex" is the only valid value
    lookup:
      <optional-object-type>: <optional-object-name>

Note

Variables are assumed to be of type string, unless type is set to complex. See Define a complex variable.

To reference a custom variable within bundle configuration, use the substitution ${var.<variable_name>}.

For more information on custom variables and substitutions, see Substitutions and variables in Databricks Asset Bundles.

workspace

The bundle configuration file can contain only one top-level workspace mapping to specify any non-default Azure Databricks workspace settings to use.

Important

Valid Databricks workspace paths begin with either /Workspace or /Volumes. Custom workspace paths are automatically prefixed with /Workspace, so if you use any workspace path substitution in your custom path such as ${workspace.file_path}, you do not need to prepend /Workspace to the path.

root_path

This workspace mapping can contain a root_path mapping to specify a non-default root path to use within the workspace for both deployments and workflow runs, for example:

workspace:
  root_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}

By default, for root_path the Databricks CLI uses the default path of /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}, which uses substitutions.

artifact_path

This workspace mapping can also contain an artifact_path mapping to specify a non-default artifact path to use within the workspace for both deployments and workflow runs, for example:

workspace:
  artifact_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}/artifacts

By default, for artifact_path the Databricks CLI uses the default path of ${workspace.root}/artifacts, which uses substitutions.

Note

The artifact_path mapping does not support Databricks File System (DBFS) paths.

file_path

This workspace mapping can also contain a file_path mapping to specify a non-default file path to use within the workspace for both deployments and workflow runs, for example:

workspace:
  file_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}/files

By default, for file_path the Databricks CLI uses the default path of ${workspace.root}/files, which uses substitutions.

state_path

The state_path mapping defaults to the default path of ${workspace.root}/state and represents the path within your workspace to store Terraform state information about deployments.

Other workspace mappings

The workspace mapping can also contain the following optional mappings to specify the Azure Databricks authentication mechanism to use. If they are not specified within this workspace mapping, they must be specified in a workspace mapping as a child of one or more of the targets in the top-level targets mapping.

Important

You must hard-code values for the following workspace mappings for Azure Databricks authentication. For instance, you cannot specify custom variables for these mappings’ values by using the ${var.*} syntax.

  • The profile mapping, (or the --profile or -p options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI) specifies the name of a configuration profile to use with this workspace for Azure Databricks authentication. This configuration profile maps to the one that you created when you set up the Databricks CLI.

    Note

    Databricks recommends that you use the host mapping (or the --profile or -p options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI) instead of the profile mapping, as this makes your bundle configuration files more portable. Setting the host mapping instructs the Databricks CLI to find a matching profile in your .databrickscfg file and then use that profile’s fields to determine which Databricks authentication type to use. If multiple profiles with a matching host field exist within your .databrickscfg file, then you must use the profile mapping (or the --profile or -p command-line options) to instruct the Databricks CLI about which profile to use. For an example, see the prod target declaration in the examples.

  • The host mapping specifies the URL for your Azure Databricks workspace. See Per-workspace URL.

  • For OAuth machine-to-machine (M2M) authentication, the mapping client_id is used. Alternatively, you can set this value in the local environment variable DATABRICKS_CLIENT_ID. Or you can create a configuration profile with the client_id value and then specify the profile’s name with the profile mapping (or by using the --profile or -p options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI). See Authenticate access to Azure Databricks with a service principal using OAuth (OAuth M2M).

    Note

    You cannot specify an Azure Databricks OAuth secret value in the bundle configuration file. Instead, set the local environment variable DATABRICKS_CLIENT_SECRET. Or you can add the client_secret value to a configuration profile and then specify the profile’s name with the profile mapping (or by using the --profile or -p options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI).

  • For Azure CLI authentication, the mapping azure_workspace_resource_id is used. Alternatively, you can set this value in the local environment variable DATABRICKS_AZURE_RESOURCE_ID. Or you can create a configuration profile with the azure_workspace_resource_id value and then specify the profile’s name with the profile mapping (or by using the --profile or -p options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI). See Azure CLI authentication.

  • For Azure client secret authentication with service principals, the mappings azure_workspace_resource_id, azure_tenant_id, and azure_client_id are used. Alternatively, you can set these values in the local environment variables DATABRICKS_AZURE_RESOURCE_ID, ARM_TENANT_ID, and ARM_CLIENT_ID, respectively. Or you can create a configuration profile with the azure_workspace_resource_id, azure_tenant_id, and azure_client_id values and then specify the profile’s name with the profile mapping (or by using the --profile or -p options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI). See MS Entra service principal authentication.

    Note

    You cannot specify an Azure client secret value in the bundle configuration file. Instead, set the local environment variable ARM_CLIENT_SECRET. Or you can add the azure_client_secret value to a configuration profile and then specify the profile’s name with the profile mapping (or by using the --profile or -p options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI).

  • For Azure managed identities authentication, the mappings azure_use_msi, azure_client_id, and azure_workspace_resource_id are used. Alternatively, you can set these values in the local environment variables ARM_USE_MSI, ARM_CLIENT_ID, and DATABRICKS_AZURE_RESOURCE_ID, respectively. Or you can create a configuration profile with the azure_use_msi, azure_client_id, and azure_workspace_resource_id values and then specify the profile’s name with the profile mapping (or by using the --profile or -p options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI). See Azure managed identities authentication.

  • The azure_environment mapping specifies the Azure environment type (such as Public, UsGov, China, and Germany) for a specific set of API endpoints. The default value is PUBLIC. Alternatively, you can set this value in the local environment variable ARM_ENVIRONMENT. Or you can add the azure_environment value to a configuration profile and then specify the profile’s name with the profile mapping (or by using the --profile or -p options when running the bundle validate, deploy, run, and destroy commands with the Databricks CLI).

  • The azure_login_app_id mapping is non-operational and is reserved for internal use.

  • The auth_type mapping specifies the Azure Databricks authentication type to use, especially in cases where the Databricks CLI infers an unexpected authentication type. See the Authenticate access to Azure Databricks resources.

permissions

The top-level permissions mapping specifies one or more permission levels to apply to all resources defined in the bundle. If you want to apply permissions to a specific resource, see Define permissions for a specific resource.

Allowed top-level permission levels are CAN_VIEW, CAN_MANAGE, and CAN_RUN.

The following example in a bundle configuration file defines permission levels for a user, group, and service principal, which are applied to all jobs, pipelines, experiments, and models defined in resources in the bundle:

permissions:
  - level: CAN_VIEW
    group_name: test-group
  - level: CAN_MANAGE
    user_name: someone@example.com
  - level: CAN_RUN
    service_principal_name: 123456-abcdef

artifacts

The top-level artifacts mapping specifies one or more artifacts that are automatically built during bundle deployments and can be used later in bundle runs. Each child artifact supports the following mappings:

  • type is required. To build a Python wheel file before deploying, this mapping must be set to whl.
  • path is an optional, relative path from the location of the bundle configuration file to the location of the Python wheel file’s setup.py file. If path is not included, the Databricks CLI will attempt to find the Python wheel file’s setup.py file in the bundle’s root.
  • files is an optional mapping that includes a child source mapping, which you can use to specify non-default locations to include for complex build instructions. Locations are specified as relative paths from the location of the bundle configuration file.
  • build is an optional set of non-default build commands that you want to run locally before deployment. For Python wheel builds, the Databricks CLI assumes that it can find a local install of the Python wheel package to run builds, and it runs the command python setup.py bdist_wheel by default during each bundle deployment. To specify multiple build commands, separate each command with double-ampersand (&&) characters.

For more information, including a sample bundle that uses artifacts, see Develop a Python wheel file using Databricks Asset Bundles.

Tip

You can define, combine, and override the settings for artifacts in bundles by using the techniques described in Define artifact settings dynamically in Databricks Asset Bundles.

include

The include array specifies a list of path globs that contain configuration files to include within the bundle. These path globs are relative to the location of the bundle configuration file in which the path globs are specified.

The Databricks CLI does not include any configuration files by default within the bundle. You must use the include array to specify any and all configuration files to include within the bundle, other than the databricks.yml file itself.

This include array can appear only as a top-level mapping.

The following example configuration includes three configuration files. These files are in the same folder as the bundle configuration file:

include:
  - "bundle.artifacts.yml"
  - "bundle.resources.yml"
  - "bundle.targets.yml"

The following example configuration includes all files with filenames that begin with bundle and end with .yml. These files are in the same folder as the bundle configuration file:

include:
  - "bundle*.yml"

resources

The resources mapping specifies information about the Azure Databricks resources used by the bundle.

This resources mapping can appear as a top-level mapping, or it can be a child of one or more of the targets in the top-level targets mapping, and includes zero or one of the supported resource types. Each resource type mapping includes one or more individual resource declarations, which must each have a unique name. These individual resource declarations use the corresponding object’s create operation’s request payload, expressed in YAML, to define the resource. Supported properties for a resource are the corresponding object’s supported fields.

Create operation request payloads are documented in the Databricks REST API Reference, and the databricks bundle schema command outputs all supported object schemas. In addition, the databricks bundle validate command returns warnings if unknown resource properties are found in bundle configuration files.

The following table lists supported resource types for bundles and links to documentation on their corresponding payloads.

Resource type Resource mappings
cluster Cluster mappings: POST /api/2.1/clusters/create
dashboard Dashboard mappings: POST /api/2.0/preview/sql/dashboards
experiment Experiment mappings: POST /api/2.0/mlflow/experiments/create
job Job mappings: POST /api/2.1/jobs/create

For additional information, see job task types and override new job cluster settings.
pipeline Pipeline mappings: POST /api/2.0/pipelines
model Model mappings: POST /api/2.0/mlflow/registered-models/create
model_serving_endpoint Model serving endpoint mappings: POST /api/2.0/serving-endpoints
registered_model (Unity Catalog) Unity Catalog model mappings: POST /api/2.1/unity-catalog/models
schema (Unity Catalog) Unity Catalog schema mappings: POST /api/2.1/unity-catalog/schemas

All paths to folders and files referenced by resource declarations are relative to the location of the bundle configuration file in which these paths are specified.

cluster

The cluster resource allows you to create all-purpose clusters. The following example creates a cluster named my_cluster and sets that as the cluster to use to run the notebook in my_job:

bundle:
  name: clusters

resources:
  clusters:
    my_cluster:
      num_workers: 2
      node_type_id: "i3.xlarge"
      autoscale:
        min_workers: 2
        max_workers: 7
      spark_version: "13.3.x-scala2.12"
      spark_conf:
        "spark.executor.memory": "2g"

  jobs:
    my_job:
      tasks:
        - task_key: test_task
          existing_cluster_id: ${resources.clusters.my_cluster.id}
          notebook_task:
            notebook_path: "./src/my_notebook.py"

dashboard

The dashboard resource allows you to manage AI/BI dashboards in a bundle. For information about AI/BI dashboards, see Dashboards.

The following example includes and deploys the sample NYC Taxi Trip Analysis dashboard to the Databricks workspace.

resources:
  dashboards:
    nyc_taxi_trip_analysis:
      display_name: "NYC Taxi Trip Analysis"
      file_path: ../src/nyc_taxi_trip_analysis.lvdash.json
      warehouse_id: ${var.warehouse_id}

If you use the UI to modify the dashboard, modifications made through the UI are not applied to the dashboard JSON file in the local bundle unless you explicitly update it using bundle generate. You can use the --watch option to continuously poll and retrieve changes to the dashboard. See Generate a bundle configuration file.

In addition, if you attempt to deploy a bundle that contains a dashboard JSON file that is different than the one in the remote workspace, an error will occur. To force the deploy and overwrite the dashboard in the remote workspace with the local one, use the --force option. See Deploy a bundle.

job

The following example declares a job with the resource key of hello-job:

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          existing_cluster_id: 1234-567890-abcde123
          notebook_task:
            notebook_path: ./hello.py

pipeline

The following example declares a pipeline with the resource key of hello-pipeline:

resources:
  pipelines:
    hello-pipeline:
      name: hello-pipeline
      clusters:
        - label: default
          num_workers: 1
      development: true
      continuous: false
      channel: CURRENT
      edition: CORE
      photon: false
      libraries:
        - notebook:
            path: ./pipeline.py

schema

The schema resource type allows you to define Unity Catalog schemas for tables and other assets in your workflows and pipelines created as part of a bundle. A schema, different from other resource types, has the following limitations:

  • The owner of a schema resource is always the deployment user, and cannot be changed. If run_as is specified in the bundle, it will be ignored by operations on the schema.
  • Only fields supported by the corresponding Schemas object create API are available for the schema resource. For example, enable_predictive_optimization is not supported as it is only available on the update API.

The following example declares a pipeline with the resource key my_pipeline that creates a Unity Catalog schema with the key my_schema as the target:

resources:
  pipelines:
    my_pipeline:
      name: test-pipeline-{{.unique_id}}
      libraries:
        - notebook:
            path: ./nb.sql
      development: true
      catalog: main
      target: ${resources.schemas.my_schema.id}

  schemas:
    my_schema:
      name: test-schema-{{.unique_id}}
      catalog_name: main
      comment: This schema was created by DABs.

A top-level grants mapping is not supported by Databricks Asset Bundles so if you want to set grants for a schema, define the grants for the schema within the schemas mapping:

schemas:
    my_schema:
      name: test-schema
      grants:
        - principal: users
          privileges:
            - CAN_MANAGE
        - principal: my_team
          privileges:
            - CAN_READ
      catalog_name: main
      comment: "my schema with grants"

sync

The sync mapping allows you to configure which files are part of your bundle deployments.

include and exclude

The include and exclude mappings within the sync mapping specifies a list of file or folders to include within, or exclude from, bundle deployments, depending on the following rules:

  • Based on any list of file and path globs in a .gitignore file in the bundle’s root, the include mapping can contain a list of file globs, path globs, or both, relative to the bundle’s root, to explicitly include.
  • Based on any list of file and path globs in a .gitignore file in the bundle’s root, plus the list of file and path globs in the include mapping, the exclude mapping can contain a list of file globs, path globs, or both, relative to the bundle’s root, to explicitly exclude.

All paths to specified files and folders are relative to the location of the bundle configuration file in which they are specified.

The syntax for include and exclude file and path patterns follow standard .gitignore pattern syntax. See gitignore Pattern Format.

For example, if the following .gitignore file contains the following entries:

.databricks
my_package/dist

And the bundle configuration file contains the following include mapping:

sync:
  include:
    - my_package/dist/*.whl

Then all of the files in the my_package/dist folder with a file extension of *.whl are included. Any other files in the my_package/dist folder are not included.

However, if the bundle configuration file also contains the following exclude mapping:

sync:
  include:
    - my_package/dist/*.whl
  exclude:
    - my_package/dist/delete-me.whl

Then all of the files in the my_package/dist folder with a file extension of *.whl, except for the file named delete-me.whl, are included. Any other files in the my_package/dist folder are also not included.

The sync mapping can also be declared in the targets mapping for a specific target. Any sync mapping declared in a target is merged with any top-level sync mapping declarations. For example, continuing with the preceding example, the following include mapping at the targets level merges with the include mapping in the top-level sync mapping:

targets:
  dev:
    sync:
      include:
        - my_package/dist/delete-me.whl

paths

The sync mapping can contain a paths mapping that specifies local paths to synchronize to the workspace. The paths mapping allows you to share common files across bundles, and can be used to sync files located outside of the bundle root. (The bundle root is the location of the databricks.yml file.) This is especially useful when you have a single repository that hosts multiple bundles and want to share libraries, code files, or configuration.

Specified paths must be relative to files and directories anchored at the folder where the paths mapping is set. If one or more path values traverse up the directory to an ancestor of the bundle root, the root path is dynamically determined to ensure that the folder structure remains intact. For example, if the bundle root folder is named my_bundle then this configuration in databricks.yml syncs the common folder located one level above the bundle root and the bundle root itself:

sync:
  paths:
    - ../common
    - .

A deploy of this bundle results in the following folder structure in the workspace:

common/
  common_file.txt
my_bundle/
  databricks.yml
  src/
    ...

targets

The targets mapping specifies one or more contexts in which to run Azure Databricks workflows. Each target is a unique collection of artifacts, Azure Databricks workspace settings, and Azure Databricks job or pipeline details.

The targets mapping consists of one or more target mappings, which must each have a unique programmatic (or logical) name.

This targets mapping is optional but highly recommended. If it is specified, it can appear only as a top-level mapping.

The settings in the top-level workspace, artifacts, and resources mappings are used if they are not specified in a targets mapping, but any conflicting settings are overridden by the settings within a target.

A target can also override the values of any top-level variables.

default

To specify a target default for bundle commands, set the default mapping to true. For example, this target named dev is the default target:

targets:
  dev:
    default: true

If a default target is not configured, or if you want to validate, deploy, and run jobs or pipelines within a specific target, use the -t option of the bundle commands.

The following commands validate, deploy, and run my_job within the dev and prod targets:

databricks bundle validate
databricks bundle deploy -t dev
databricks bundle run -t dev my_job
databricks bundle validate
databricks bundle deploy -t prod
databricks bundle run -t prod my_job

The following example declares two targets. The first target has the name dev and is the default target used when no target is specified for bundle commands. The second target has the name prod and is used only when this target is specified for bundle commands.

targets:
  dev:
    default: true
  prod:
    workspace:
      host: https://<production-workspace-url>

mode and presets

To facilitate easy development and CI/CD best practices, Databricks Asset Bundles provides deployment modes for targets that set default behaviors for pre-production and production workflows. Some behaviors are also configurable. For details, see Databricks Asset Bundle deployment modes.

Tip

To set run identities for bundles, you can specify run_as for each target, as described in Specify a run identity for a Databricks Asset Bundles workflow.

To specify that a target is treated as a development target, add the mode mapping set to development. To specify that a target is treated as a production target, add the mode mapping set to production. For example, this target named prod is treated as a production target:

targets:
  prod:
    mode: production

You can customize some of the behaviors using the presets mapping. For a list of available presets, see Custom presets. The following example shows a customized production target that prefixes and tags all production resources:

targets:
  prod:
    mode: production
    presets:
      name_prefix: "production_"  # prefix all resource names with production_
      tags:
        prod: true

If both mode and presets are set, presets override the default mode behavior. Settings of individual resources override the presets. For example, if a schedule is set to UNPAUSED, but the trigger_pause_status preset is set to PAUSED, the schedule will be unpaused.