Create and run Azure Databricks Jobs
This article details how to create and run Azure Databricks Jobs using the Jobs UI. To learn about using the Databricks CLI to create and run jobs, see Jobs CLI. To learn about using the Jobs API, see the Jobs API.
To learn about configuration options for jobs and how to edit your existing jobs, see Configure settings for Azure Databricks jobs.
To learn how to manage and monitor job runs, see View and manage job runs.
To create your first workflow with an Azure Databricks job, see the quickstart.
- You can create jobs only in a Data Science & Engineering workspace or a Machine Learning workspace.
- A workspace is limited to 1000 concurrent task runs. A
429 Too Many Requestsresponse is returned when you request a run that cannot start immediately.
- The number of jobs a workspace can create in an hour is limited to 10000 (includes “runs submit”). This limit also affects jobs created by the REST API and notebook workflows.
Create a job
Do one of the following:
- Click Workflows in the sidebar and click .
- In the sidebar, click New and select Job.
The Tasks tab appears with the create task dialog.
Replace Add a name for your job… with your job name.
Enter a name for the task in the Task name field.
In the Type dropdown menu, select the type of task to run. See Task type options.
Configure the cluster where the task runs. In the Cluster dropdown menu, select either New job cluster or Existing All-Purpose Clusters.
- New Job Cluster: Click Edit in the Cluster dropdown menu and complete the cluster configuration.
- Existing All-Purpose Cluster: Select an existing cluster in the Cluster dropdown menu. To open the cluster in a new page, click the icon to the right of the cluster name and description.
To learn more about selecting and configuring clusters to run tasks, see Use Azure Databricks compute with your jobs.
To add dependent libraries, click + Add next to Dependent libraries. See Configure dependent libraries.
You can pass parameters for your task. Each task type has different requirements for formatting and passing the parameters.
- Notebook: Click Add and specify the key and value of each parameter to pass to the task. You can override or add additional parameters when you manually run a task using the Run a job with different parameters option. Parameters set the value of the notebook widget specified by the key of the parameter. Use task parameter variables to pass a limited set of dynamic values as part of a parameter value.
- JAR: Use a JSON-formatted array of strings to specify parameters. These strings are passed as arguments to the main method of the main class. See Configuring JAR job parameters.
- Spark Submit task: Parameters are specified as a JSON-formatted array of strings. Conforming to the Apache Spark spark-submit convention, parameters after the JAR path are passed to the main method of the main class.
- Python script: Use a JSON-formatted array of strings to specify parameters. These strings are passed as arguments which can be parsed using the argparse module in Python.
- Python Wheel: In the Parameters dropdown menu, select Positional arguments to enter parameters as a JSON-formatted array of strings, or select Keyword arguments > Add to enter the key and value of each parameter. Both positional and keyword arguments are passed to the Python wheel task as command-line arguments.
To optionally receive notifications for task start, success, or failure, click + Add next to Emails. Failure notifications are sent on initial task failure and any subsequent retries.
To optionally configure a retry policy for the task, click + Add next to Retries. See Configure a retry policy.
To optionally configure a timeout for the task, click + Add next to Timeout in seconds. See Configure a timeout for a task.
After creating the first task, you can configure job-level settings such as notifications, job triggers, and permissions. See Edit a job.
To add another task, click in the DAG view. A shared cluster option is provided if you have configured a New Job Cluster for a previous task. You can also configure a cluster for each task when you create or edit a task. To learn more about selecting and configuring clusters to run tasks, see Use Azure Databricks compute with your jobs.
Task type options
The following are the task types you can add to your Azure Databricks job and available options for the different task types:
Notebook: In the Source dropdown menu, select a location for the notebook; either Workspace for a notebook located in a Azure Databricks workspace folder or Git provider for a notebook located in a remote Git repository.
Workspace: Use the file browser to find the notebook, click the notebook name, and click Confirm.
Git provider: Click Edit and enter the Git repository information. See Use a notebook from a remote Git repository.
Total notebook cell output (the combined output of all notebook cells) is subject to a 20MB size limit. Additionally, individual cell output is subject to an 8MB size limit. If total cell output exceeds 20MB in size, or if the output of an individual cell is larger than 8MB, the run is canceled and marked as failed.
If you need help finding cells near or beyond the limit, run the notebook against an all-purpose cluster and use this notebook autosave technique.
JAR: Specify the Main class. Use the fully qualified name of the class containing the main method, for example,
org.apache.spark.examples.SparkPi. Then click Add under Dependent Libraries to add libraries required to run the task. One of these libraries must contain the main class.
To learn more about JAR tasks, see Use a JAR in an Azure Databricks job.
Spark Submit: In the Parameters text box, specify the main class, the path to the library JAR, and all arguments, formatted as a JSON array of strings. The following example configures a spark-submit task to run the
DFSReadWriteTestfrom the Apache Spark examples:
There are several limitations for spark-submit tasks:
- You can run spark-submit tasks only on new clusters.
- Spark-submit does not support cluster autoscaling. To learn more about autoscaling, see Cluster autoscaling.
- Spark-submit does not support Databricks Utilities. To use Databricks Utilities, use JAR tasks instead.
- If you are using a Unity Catalog-enabled cluster, spark-submit is supported only if the cluster uses Single User access mode. Shared access mode is not supported.
- Spark Streaming jobs should never have maximum concurrent runs set to greater than 1. Streaming jobs should be set to run using the cron expression
"* * * * * ?"(every minute). Since a streaming task runs continuously, it should always be the final task in a job.
Python script: In the Source drop-down, select a location for the Python script, either Workspace for a script in the local workspace, or DBFS / S3 for a script located on DBFS or cloud storage. In the Path textbox, enter the path to the Python script:
Workspace: In the Select Python File dialog, browse to the Python script and click Confirm. Your script must be in a Databricks repo.
DBFS: Enter the URI of a Python script on DBFS or cloud storage; for example,
Git provider: Click Edit and enter the Git repository information. See Use Python code from a remote Git repository.
Delta Live Tables Pipeline: In the Pipeline dropdown menu, select an existing Delta Live Tables pipeline.
You can use only triggered pipelines with the Pipeline task. Continuous pipelines are not supported as a job task. To learn more about triggered and continuous pipelines, see Continuous vs. triggered pipeline execution.
Python Wheel: In the Package name text box, enter the package to import, for example,
myWheel-1.0-py2.py3-none-any.whl. In the Entry Point text box, enter the function to call when starting the wheel. Click Add under Dependent Libraries to add libraries required to run the task.
SQL: In the SQL task dropdown menu, select Query, Dashboard, or Alert.
- The SQL task requires Databricks SQL and a serverless or pro SQL warehouse.
Query: In the SQL query dropdown menu, select the query to execute when the task runs. In the SQL warehouse dropdown menu, select a serverless or pro SQL warehouse to run the task.
Dashboard: In the SQL dashboard dropdown menu, select a dashboard to be updated when the task runs. In the SQL warehouse dropdown menu, select a serverless or pro SQL warehouse to run the task.
Alert: In the SQL alert dropdown menu, select an alert to trigger for evaluation. In the SQL warehouse dropdown menu, select a serverless or pro SQL warehouse to run the task.
dbt: See Use dbt transformations in an Azure Databricks job for a detailed example of how to configure a dbt task.
Copy a task path
Certain task types, for example, notebook tasks, allow you to copy the path to the task source code:
- Click the Tasks tab.
- Select the task containing the path to copy.
- Click next to the task path to copy the path to the clipboard.
Create a job from an existing job
You can quickly create a new job by cloning an existing job. Cloning a job creates an identical copy of the job, except for the job ID. On the job’s page, click More … next to the job’s name and select Clone from the dropdown menu.
Create a task from an existing task
You can quickly create a new task by cloning an existing task:
- On the job’s page, click the Tasks tab.
- Select the task to clone.
- Click and select Clone task.
Delete a job
To delete a job, on the job’s page, click More … next to the job’s name and select Delete from the dropdown menu.
Delete a task
To delete a task:
- Click the Tasks tab.
- Select the task to be deleted.
- Click and select Remove task.
Run a job
- Click Workflows in the sidebar.
- Select a job and click the Runs tab. You can run a job immediately or schedule the job to run later.
If one or more tasks in a job with multiple tasks are not successful, you can re-run the subset of unsuccessful tasks. See Re-run failed and skipped tasks.
Run a job immediately
To run the job immediately, click .
You can perform a test run of a job with a notebook task by clicking Run Now. If you need to make changes to the notebook, clicking Run Now again after editing the notebook will automatically run the new version of the notebook.
Run a job with different parameters
You can use Run Now with Different Parameters to re-run a job with different parameters or different values for existing parameters.
- Click next to Run Now and select Run Now with Different Parameters or, in the Active Runs table, click Run Now with Different Parameters. Enter the new parameters depending on the type of task.
- Notebook: You can enter parameters as key-value pairs or a JSON object. The provided parameters are merged with the default parameters for the triggered run. You can use this dialog to set the values of widgets.
- JAR and spark-submit: You can enter a list of parameters or a JSON document. If you delete keys, the default parameters are used. You can also add task parameter variables for the run.
- Click Run.
Run a job on a schedule
You can use a schedule to automatically run your Azure Databricks job at specified times and periods. See Add a job schedule.
Run a continuous job
You can ensure there’s always an active run of your job. See Run a continuous job.
Run a job when new files arrive
To trigger a job run when new files arrive in an external location, use a file arrival trigger.
Submit and view feedback for