Jaa


Configure and edit Databricks Jobs

This article focuses on instructions for creating, configuring, and editing jobs using the Workflows workspace UI. Azure Databricks has other entry points and tools for configuration, including the following:

Tip

To view a job as YAML, click the kebab menu to the left of Run now for the job and then click Switch to code version (YAML).

Create a new job

This section describes the minimum configuration needed to create a new job to schedule a notebook task with the workspace UI.

Jobs contain one or more tasks. You create a new job by configuring the first task for that job.

Note

Each task type has dynamic configuration options in the workspace UI. See Configure and edit Databricks tasks.

  1. Click Workflows Icon Workflows in the sidebar and click Create Job Button.
  2. Enter a Task name.
  3. Select a notebook for the Path field.
  4. Click Create task.

If your workspace is not enabled for serverless compute for jobs, you must select a Compute option. Databricks recommends always using jobs compute when configuring tasks.

A new job appears in the workspace jobs list with the default name New Job <date> <time>.

Select a job to edit in the workspace

To edit an existing job with the workspace UI, do the following:

  1. Click Workflows Icon Workflows in the sidebar.
  2. In the Name column, click the job name.

Use the jobs UI to do the following:

  • Edit job settings
  • Rename, clone, or delete a job
  • Add new tasks to an existing job
  • Edit task settings

Note

You can also view the JSON definitions for use with REST API get, create, and reset endpoints.

Edit job settings

The side panel contains the Job details. You can change the job trigger, compute configuration, notifications, the maximum number of concurrent runs, configure duration thresholds, and add or change tags. You can also edit job permissions if job access control is enabled.

Add parameters for all job tasks

Parameters configured at the job level are passed to the job’s tasks that accept key-value parameters, including Python wheel files configured to accept keyword arguments. See Parameterize jobs.

Add tags to a job

To add labels or key-value attributes to your job, you can add tags when you edit the job. You can use tags to filter jobs in the Jobs list. For example, you can use a department tag to filter all jobs that belong to a specific department.

Note

Because job tags are not designed to store sensitive information such as personally identifiable information or passwords, Databricks recommends using tags for non-sensitive values only.

Tags also propagate to job clusters created when a job is run, allowing you to use tags with your existing cluster monitoring.

Click + Tag in the Job details side panel to add or edit tags. You can add the tag as a label or key-value pair. To add a label, enter the label in the Key field and leave the Value field empty.

Add a budget policy to a job

Important

This feature is in Public Preview.

If your workspace uses budget policies to attribute serverless usage, you can select your jobs’s budget policy using the Budget policy setting in the Job details side panel. See Attribute serverless usage with budget policies.

Rename, clone, or delete a job

To rename a job, go the jobs UI and click the job name.

You can quickly create a new job by cloning an existing job. Cloning a job creates an identical copy of the job except for the job ID. To clone a job, do the following:

  1. Go to jobs UI for the job.
  2. Click Kebab menu next to the Run now button.
  3. Select Clone job from the drop-down menu.
  4. Enter a name for the cloned job.
  5. Click Clone.

Delete a job

To delete a job, go to the job page, click Kebab menu next to the job name, and select Delete job from the drop-down menu.

Use Git with jobs

If your job contains any tasks that support using a remote Git provider, the jobs UI contains a Git field and the option to add or edit Git settings.

You can configure the following task types to use a remote Git repository:

  • Notebooks
  • Python scripts
  • SQL files
  • dbt

All tasks in a job must reference the same commit in the remote repository. You must specify only one of the following for a job that uses a remote repository:

  • branch: The name of the branch, for example, main.
  • tag: The tag’s name, for example, release-1.0.0.
  • commit: The hash of a specific commit, for example, e0056d01.

When a job run begins, Databricks takes a snapshot commit of the remote repository to ensure that the entire job runs against the same version of code.

When you view the run history of a task that runs code stored in a remote Git repository, the Task run details panel includes Git details, including the commit SHA associated with the run. See View task run history.

Note

Tasks configured to use a remote Git repository cannot write to workspace files. They must write temporary data to ephemeral driver storage and persistent data to a volume or table.

Databricks recommends creating jobs referencing workspace paths in Git folders only for quick iteration and testing during development. Databricks recommends reconfiguring jobs to reference a remote Git repository as you move into staging and production. Learn more about version-controlled source code in a Databricks job.

Configure a Git provider

The jobs UI has a dialog to configure a remote Git repository. This dialog is accessible from the Job details panel under the Git heading or in any task configured to use a Git provider.

The options displayed to access the dialog vary based on task type and whether or not a git reference has already been configured for the job. Buttons to launch the dialog include Add Git settings, Edit, or Add a git reference.

In the Git Information dialog (just labelled Git if access by the Job details panel), enter the following details:

  • The Git repository URL.
  • Select your Git provider from the dropdown list.
  • In the Git reference field, enter the identifier for a branch, tag, or commit that corresponds to the version of the source code you want to run.
  • Select branch, tag, or commit from the dropdown.

Note

The dialog might prompt you with the following: Git credentials for this account are missing. Add credentials. You must configure a remote Git repository before using it as a reference. See Set up Databricks Git folders (Repos).

Configure an expected completion time or a timeout for a job

You can configure optional duration thresholds for a job, including an expected and maximum completion time. To configure duration thresholds, click Set duration thresholds under Duration thresholds in the Job details panel.

Enter a duration in the Warning field to configure the job’s expected completion time. If the job exceeds this threshold, an event is triggered. You can use this event to notify when a job is running slowly. See Configure notifications for slow running or late jobs.

To configure a maximum completion time for a job, enter the maximum duration in the Timeout field. If the job does not complete in this time, Azure Databricks sets its status to “Timed Out”.

You can optionally specify duration thresholds for tasks. See Configure an expected completion time or a timeout for a task.