Add tasks to jobs in Databricks Asset Bundles
This article provides examples of various types of tasks that you can add to Azure Databricks jobs in Databricks Asset Bundles. See What are Databricks Asset Bundles?.
Most job task types have task-specific parameters among their supported settings, but you can also define job parameters that get passed to tasks. Dynamic value references are supported for job parameters, which enable passing values specific to the job run between tasks. See What is a dynamic value reference?.
Note
You can override job task settings. See Override job tasks settings in Databricks Asset Bundles.
Tip
To quickly generate resource configuration for an existing job using the Databricks CLI, you can use the bundle generate job
command. See bundle commands.
Notebook task
You use this task to run a notebook.
The following example adds a notebook task to a job and sets a job parameter named my_job_run_id
. The path for the notebook to deploy is relative to the configuration file in which this task is declared. The task gets the notebook from its deployed location in the Azure Databricks workspace. (Ellipses indicate omitted content, for brevity.)
# ...
resources:
jobs:
my-notebook-job:
name: my-notebook-job
# ...
tasks:
- task_key: my-notebook-task
notebook_task:
notebook_path: ./my-notebook.ipynb
parameters:
- name: my_job_run_id
default: "{{job.run_id}}"
# ...
# ...
For additional mappings that you can set for this task, see tasks > notebook_task
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format. See Notebook task for jobs.
Python script task
You use this task to run a Python file.
The following example adds a Python script task to a job. The path for the Python file to deploy is relative to the configuration file in which this task is declared. The task gets the Python file from its deployed location in the Azure Databricks workspace. (Ellipses indicate omitted content, for brevity.)
# ...
resources:
jobs:
my-python-script-job:
name: my-python-script-job
# ...
tasks:
- task_key: my-python-script-task
spark_python_task:
python_file: ./my-script.py
# ...
# ...
For additional mappings that you can set for this task, see tasks > spark_python_task
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format. See also Python script task for jobs.
Python wheel task
You use this task to run a Python wheel file.
The following example adds a Python wheel task to a job. The path for the Python wheel file to deploy is relative to the configuration file in which this task is declared. See Databricks Asset Bundles library dependencies. (Ellipses indicate omitted content, for brevity.)
# ...
resources:
jobs:
my-python-wheel-job:
name: my-python-wheel-job
# ...
tasks:
- task_key: my-python-wheel-task
python_wheel_task:
entry_point: run
package_name: my_package
libraries:
- whl: ./my_package/dist/my_package-*.whl
# ...
# ...
For additional mappings that you can set for this task, see tasks > python_wheel_task
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format. See also Develop a Python wheel file using Databricks Asset Bundles and Python Wheel task for jobs.
JAR task
You use this task to run a JAR. You can reference local JAR libraries or those in a workspace, a Unity Catalog volume, or an external cloud storage location. See Databricks Asset Bundles library dependencies.
The following example adds a JAR task to a job. The path for the JAR is to the specified volume location. (Ellipses indicate omitted content, for brevity.)
# ...
resources:
jobs:
my-jar-job:
name: my-jar-job
# ...
tasks:
- task_key: my-jar-task
spark_jar_task:
main_class_name: org.example.com.Main
libraries:
- jar: /Volumes/main/default/my-volume/my-project-0.1.0-SNAPSHOT.jar
# ...
# ...
For additional mappings that you can set for this task, see tasks > spark_jar_task
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format. See JAR task for jobs.
SQL file task
You use this task to run a SQL file located in a workspace or a remote Git repository.
The following example adds a SQL file task to a job. This SQL file task uses the specified SQL warehouse to run the specified SQL file. (Ellipses indicate omitted content, for brevity.)
# ...
resources:
jobs:
my-sql-file-job:
name: my-sql-file-job
# ...
tasks:
- task_key: my-sql-file-task
sql_task:
file:
path: /Users/someone@example.com/hello-world.sql
source: WORKSPACE
warehouse_id: 1a111111a1111aa1
# ...
# ...
To get a SQL warehouse’s ID, open the SQL warehouse’s settings page, then copy the ID found in parentheses after the name of the warehouse in the Name field on the Overview tab.
For additional mappings that you can set for this task, see tasks > sql_task > file
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format. See SQL task for jobs.
Delta Live Tables pipeline task
You use this task to run a Delta Live Tables pipeline. See What is Delta Live Tables?.
The following example adds a Delta Live Tables pipeline task to a job. This Delta Live Tables pipeline task runs the specified pipeline. (Ellipses indicate omitted content, for brevity.)
# ...
resources:
jobs:
my-pipeline-job:
name: my-pipeline-job
# ...
tasks:
- task_key: my-pipeline-task
pipeline_task:
pipeline_id: 11111111-1111-1111-1111-111111111111
# ...
# ...
You can get a pipelines’s ID by opening the pipeline in the workspace and copying the Pipeline ID value on the Pipeline details tab of the pipeline’s settings page.
For additional mappings that you can set for this task, see tasks > pipeline_task
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format. See Delta Live Tables pipeline task for jobs.
dbt task
You use this task to run one or more dbt commands. See Connect to dbt Cloud.
The following example adds a dbt task to a job. This dbt task uses the specified SQL warehouse to run the specified dbt commands.
# ...
resources:
jobs:
my-dbt-job:
name: my-dbt-job
# ...
tasks:
- task_key: my-dbt-task
dbt_task:
commands:
- "dbt deps"
- "dbt seed"
- "dbt run"
project_directory: /Users/someone@example.com/Testing
warehouse_id: 1a111111a1111aa1
libraries:
- pypi:
package: "dbt-databricks>=1.0.0,<2.0.0"
# ...
# ...
To get a SQL warehouse’s ID, open the SQL warehouse’s settings page, then copy the ID found in parentheses after the name of the warehouse in the Name field on the Overview tab.
For additional mappings that you can set for this task, see tasks > dbt_task
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format. See dbt task for jobs.
Databricks Asset Bundles also includes a dbt-sql
project template that defines a job with a dbt task, as well as dbt profiles for deployed dbt jobs. For information about Databricks Asset Bundles templates, see Use a default bundle template.
Run job task
You use this task to run another job.
The following example contains a run job task in the second job that runs the first job.
# ...
resources:
jobs:
my-first-job:
name: my-first-job
tasks:
- task_key: my-first-job-task
new_cluster:
spark_version: "13.3.x-scala2.12"
node_type_id: "i3.xlarge"
num_workers: 2
notebook_task:
notebook_path: ./src/test.py
my_second_job:
name: my-second-job
tasks:
- task_key: my-second-job-task
run_job_task:
job_id: ${resources.jobs.my-first-job.id}
# ...
This example uses a substitution to retrieve the ID of the job to run. To get a job’s ID from the UI, open the job in the workspace and copy the ID from the Job ID value in the Job details tab of the jobs’s settings page.
For additional mappings that you can set for this task, see tasks > run_job_task
in the create job operation’s request payload as defined in POST /api/2.1/jobs/create in the REST API reference, expressed in YAML format.