opomba,
Dostop do te strani zahteva pooblastilo. Poskusite se vpisati alispremeniti imenike.
Dostop do te strani zahteva pooblastilo. Poskusite lahko spremeniti imenike.
This article describes the syntax for Databricks Asset Bundle configuration files, which define Databricks Asset Bundles. See What are Databricks Asset Bundles?.
To create and work with bundles, see Develop Databricks Asset Bundles.
For bundle configuration reference, see Configuration reference.
databricks.yml
A bundle must contain one (and only one) configuration file named databricks.yml at the root of the bundle project folder. databricks.yml is the main configuration file that defines a bundle, but it can reference other configuration files, such as resource configuration files, in the include mapping. Bundle configuration is expressed in YAML. For more information about YAML, see the official YAML specification.
The simplest databricks.yml defines the bundle name, which is a required top-level mapping, and a target deployment.
bundle:
name: my_bundle
targets:
dev:
default: true
For details on all top-level mappings, see Configuration reference.
Tip
Python support for Databricks Asset Bundles enables you to define resources in Python. See Bundle configuration in Python.
Specification
The following YAML specification provides top-level configuration keys for Databricks Asset Bundles. For configuration reference, see Configuration reference.
# This is the default bundle configuration if not otherwise overridden in
# the "targets" top-level mapping.
bundle: # Required.
name: string # Required.
databricks_cli_version: string
cluster_id: string
deployment: Map
git:
origin_url: string
branch: string
# This is the identity to use to run the bundle
run_as:
- user_name: <user-name>
- service_principal_name: <service-principal-name>
# These are any additional configuration files to include.
include:
- '<some-file-or-path-glob-to-include>'
- '<another-file-or-path-glob-to-include>'
# These are any scripts that can be run.
scripts:
<some-unique-script-name>:
content: string
# These are any additional files or paths to include or exclude.
sync:
include:
- '<some-file-or-path-glob-to-include>'
- '<another-file-or-path-glob-to-include>'
exclude:
- '<some-file-or-path-glob-to-exclude>'
- '<another-file-or-path-glob-to-exclude>'
paths:
- '<some-file-or-path-to-synchronize>'
# These are the default artifact settings if not otherwise overridden in
# the targets top-level mapping.
artifacts:
<some-unique-artifact-identifier>:
build: string
dynamic_version: boolean
executable: string
files:
- source: string
path: string
type: string
# These are for any custom variables for use throughout the bundle.
variables:
<some-unique-variable-name>:
description: string
default: string or complex
lookup: Map
type: string # The only valid value is "complex" if the variable is a complex variable, otherwise do not define this key.
# These are the default workspace settings if not otherwise overridden in
# the targets top-level mapping.
workspace:
artifact_path: string
auth_type: string
azure_client_id: string # For Azure Databricks only.
azure_environment: string # For Azure Databricks only.
azure_login_app_id: string # For Azure Databricks only. Reserved for future use.
azure_tenant_id: string # For Azure Databricks only.
azure_use_msi: true | false # For Azure Databricks only.
azure_workspace_resource_id: string # For Azure Databricks only.
client_id: string # For Databricks on AWS only.
file_path: string
google_service_account: string # For Databricks on Google Cloud only.
host: string
profile: string
resource_path: string
root_path: string
state_path: string
# These are the permissions to apply to resources defined
# in the resources mapping.
permissions:
- level: <permission-level>
group_name: <unique-group-name>
- level: <permission-level>
user_name: <unique-user-name>
- level: <permission-level>
service_principal_name: <unique-principal-name>
# These are the resource settings if not otherwise overridden in
# the targets top-level mapping.
resources:
apps:
<unique-app-name>:
# See the REST API create request payload reference for apps.
clusters:
<unique-cluster-name>:
# See the REST API create request payload reference for clusters.
dashboards:
<unique-dashboard-name>:
# See the REST API create request payload reference for dashboards.
experiments:
<unique-experiment-name>:
# See the REST API create request payload reference for experiments.
jobs:
<unique-job-name>:
# See REST API create request payload reference for jobs.
model_serving_endpoint:
<unique-model-serving-endpoint-name>:
# See the model serving endpoint request payload reference.
models:
<unique-model-name>:
# See the REST API create request payload reference for models (legacy).
pipelines:
<unique-pipeline-name>:
# See the REST API create request payload reference for :re[LDP] (pipelines).
quality_monitors:
<unique-quality-monitor-name>:
# See the quality monitor request payload reference.
registered_models:
<unique-registered-model-name>:
# See the registered model request payload reference.
schemas:
<unique-schema-name>:
# See the Unity Catalog schema request payload reference.
secret_scopes:
<unique-secret-scope-name>:
# See the secret scope request payload reference.
volumes:
<unique-volume-name>:
# See the Unity Catalog volume request payload reference.
# These are the targets to use for deployments and workflow runs. One and only one of these
# targets can be set to "default: true".
targets:
<some-unique-programmatic-identifier-for-this-target>:
artifacts:
# See the preceding "artifacts" syntax.
bundle:
# See the preceding "bundle" syntax.
default: boolean
git: Map
mode: string
permissions:
# See the preceding "permissions" syntax.
presets:
<preset>: <value>
resources:
# See the preceding "resources" syntax.
sync:
# See the preceding "sync" syntax.
variables:
<preceding-unique-variable-name>: <non-default-value>
workspace:
# See the preceding "workspace" syntax.
run_as:
# See the preceding "run_as" syntax.
Examples
This section contains some basic examples to help you understand how bundles work and how to structure the configuration.
Note
For configuration examples that demonstrate bundle features and common bundle use cases, see Bundle configuration examples and the bundle examples repository in GitHub.
The following example bundle configuration specifies a local file named hello.py that is in the same directory as bundle configuration file databricks.yml. It runs this notebook as a job using the remote cluster with the specified cluster ID. The remote workspace URL and workspace authentication credentials are read from the caller's local configuration profile named DEFAULT.
bundle:
name: hello-bundle
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
targets:
dev:
default: true
The following example adds a target with the name prod that uses a different remote workspace URL and workspace authentication credentials, which are read from the caller's .databrickscfg file's matching host entry with the specified workspace URL. This job runs the same notebook but uses a different remote cluster with the specified cluster ID.
Note
Databricks recommends that you use the host mapping instead of the default mapping wherever possible, as this makes your bundle configuration files more portable. Setting the host mapping instructs the Databricks CLI to find a matching profile in your .databrickscfg file and then use that profile's fields to determine which Databricks authentication type to use. If multiple profiles with a matching host field exist, then you must use the --profile option on bundle commands to specify a profile to use.
Notice that you do not need to declare the notebook_task mapping within the prod mapping, as it falls back to use the notebook_task mapping within the top-level resources mapping, if the notebook_task mapping is not explicitly overridden within the prod mapping.
bundle:
name: hello-bundle
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
targets:
dev:
default: true
prod:
workspace:
host: https://<production-workspace-url>
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 2345-678901-fabcd456
Use the following bundle commands to validate, deploy, and run this job within the dev target. For details about the lifecycle of a bundle, see Develop Databricks Asset Bundles.
# Because the "dev" target is set to "default: true",
# you do not need to specify "-t dev":
databricks bundle validate
databricks bundle deploy
databricks bundle run hello_job
# But you can still explicitly specify it, if you want or need to:
databricks bundle validate
databricks bundle deploy -t dev
databricks bundle run -t dev hello_job
To validate, deploy, and run this job within the prod target instead:
# You must specify "-t prod", because the "dev" target
# is already set to "default: true":
databricks bundle validate
databricks bundle deploy -t prod
databricks bundle run -t prod hello_job
For more modularization and better reuse of definitions and settings across bundles, split your bundle configuration into separate files:
# databricks.yml
bundle:
name: hello-bundle
include:
- '*.yml'
# hello-job.yml
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
# targets.yml
targets:
dev:
default: true
prod:
workspace:
host: https://<production-workspace-url>
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 2345-678901-fabcd456