Azure Databricks

Azure Databricks offers a unified platform for scalable data management, governance, and analytics, combining streamlined workflows with the ability to handle diverse data types efficiently

This connector is available in the following products and regions:

Service	Class	Regions
Copilot Studio	Premium	All Power Automate regions except the following: - US Government (GCC) - US Government (GCC High) - China Cloud operated by 21Vianet - US Department of Defense (DoD)
Power Apps	Premium	All Power Apps regions except the following: - US Government (GCC) - US Government (GCC High) - China Cloud operated by 21Vianet - US Department of Defense (DoD)
Power Automate	Premium	All Power Automate regions except the following: - US Government (GCC) - US Government (GCC High) - China Cloud operated by 21Vianet - US Department of Defense (DoD)

Contact
Name	Databricks Support
URL	https://help.databricks.com
Email	eng-partner-eco-help@databricks.com

Connector Metadata
Publisher	Databricks Inc.
Website	https://www.databricks.com/
Privacy policy	https://www.databricks.com/legal/privacynotice
Categories	Data

Connect to Azure Databricks from Microsoft Power Platform

This page explains how to connect to Azure Databricks from Microsoft Power Platform by adding Azure Databricks as a data connection. When connected, you can use your Azure Databricks data from the following platforms:

Power Apps: Build applications that can read from and write to Azure Databricks, while preserving your Azure Databricks governance controls.
Power Automate: Build flows and add actions that enable executing custom SQL or an existing Job and get back the results.
Copilot Studio: Build custom agents using your Azure Databricks data as a knowledge source.

Before you begin

Before you connect to Azure Databricks from Power Platform, you must meet the following requirements:

You have a Microsoft Entra ID (formerly Azure Active Directory) account.
You have a premium Power Apps license.
You have an Azure Databricks account.
You have access to a SQL warehouse in Azure Databricks.

Optional: Connect with Azure Virtual Networks

If your Azure Databricks workspace uses Virtual Networks, there are two ways to connect:

Integrate Power Platform with resources inside your virtual network without exposing them over the public internet. To connect to the private endpoint of your Azure Databricks workspace, do the following after you configure private connectivity to Azure Databricks:
- Set up Virtual Network support for Power Platform.
- If your Power Platform virtual network (whether Primary or Secondary) is different from your Azure Databricks virtual network, use virtual network peering to connect the virtual network with Azure Databricks.
For more information about virtual networks, see Virtual Network support overview.
Enable access with hybrid deployment, where a front-end private link with a public endpoint is protected by a Workspace IP Access List. To enable access, do the following:
1. Enable public access at workspace level. For more details, see Configure IP access lists for workspaces.
2. Add the AzureConnectors IP range, or specific Power Platform IP range based on your environment's region, to your Workspace IP access list.

Optional: Create a Microsoft Entra Service Principal

Important

If Azure Databricks and Power Platform are in different tenants, you must use Service Principals for authentication.

Before connecting, complete the following steps to create, set up, and assign a Microsoft Entra Service Principal to your Azure Databricks account or workspace:

Step 1: Add an Azure Databricks connection to Power Platform

Note: If you're using Copilot Studio, we recommend creating the Databricks connection in Power Apps or Power Automate. Then it can be used in Copilot Studio.

To add an Azure Databricks connection, do the following:

In Power Apps or Power Automate, from the sidebar, click Connections.
Click + New connection in the upper-left corner.
Search for "Azure Databricks" using the search bar in the upper-right.
Select the Azure Databricks tile.
Select your Authentication type from the drop down menu.
Select your authentication method and enter your authentication information.
- If your Power Platform deployment and Azure Databricks account are in the same Microsoft Entra tenant, you can use OAuth connection. Enter the following information:
  - For Server Hostname, enter the Azure Databricks SQL warehouse hostname.
  - For HTTP Path, enter the SQL warehouse HTTP path.
  - Click Create.
  - Sign in with your Microsoft Entra ID.
- Service principal connection can be used in any scenario. Before connecting, create a Microsoft Entra service principal. Enter the following information:
  - For Client ID, enter the service principal ID.
  - For Client Secret, enter the service principal secret.
  - For Tenant, enter the service principal tenant.
  - For Hostname, enter the Azure Databricks SQL warehouse hostname.
  - For HTTP Path, enter the SQL warehouse HTTP path.
  - (Optional) You can rename or share the service principal connection with your team members after the connection is created.
- To find your Azure Databricks SQL warehouse connection details, see Get connection details for an Azure Databricks compute resource.
Click Create.

Step 2: Use the Azure Databricks connection

After you create an Azure Databricks connection in Power Apps or Power Automate, you can use your Azure Databricks data to create Power canvas apps, Power Automate flows, and Copilot Studio agents.

Use your Azure Databricks data to build Power canvas apps

Important

You can only use canvas apps if directly connecting to Azure Databricks in the app. You can't use virtual tables.

To add your Azure Databricks data to your application, do the following:

From the leftmost navigation bar, click Create.
Click Start with a blank canvas and select your desired canvas size to create a new canvas app.
From your application, click Add data > Connectors > Azure Databricks. Select the Azure Databricks connection you created.
Select a catalog from the Choose a dataset sidebar.
From the Choose a dataset sidebar, select all the tables you want to connect your canvas app to.
Click Connect.

Data operations in Power Apps:

The connector supports create, update, and delete operations, but only for tables that have a primary key defined. When performing create operations you must always specify the primary key.

Note: Azure Databricks supports generated identity columns. In this case, primary key values are automatically generated on the server during row creation and cannot be manually specified.

Use your Azure Databricks data to build Power Automate flows

The Statement Execution API and the Jobs API are exposed within Power Automate, allowing you to write SQL statements and execute existing Jobs. To create a Power Automate flow using Azure Databricks as an action, do the following:

From the leftmost navigation bar, click Create.
Create a flow and add any trigger type.
From your new flow, click + and search for "Databricks" to see the available actions.

To write SQL, select one of the following actions:

Execute a SQL Statement: Write and run a SQL statement. Enter the following:
- For Body/warehouse_id, enter the ID of the warehouse upon which to execute the SQL statement.
- For Body/statement_id, enter the ID of the SQL statement to execute.
- For more about the advanced parameters, see here.
Check status and get results: Check the status of a SQL statement and gather results. Enter the following:
- For Statement ID, enter the ID returned when the SQL statement was executed.
- For more about the parameter, see here.
Cancel the execution of a statement: Terminate execution of a SQL statement. Enter the following:
- For Statement ID, enter the ID of the SQL statement to terminate.
- For more about the parameter, see here.
Get result by chunk index: Get results by chunk index, which is suitable for large result sets. Enter the following:
- For Statement ID, enter the ID of the SQL statement whose results you want to retrieve.
- For Chunk index, enter the target chunk index.
- For more about the parameters, see here.

To interact with an existing Databricks Job, select one of the following actions:

List Jobs: Retrieves a list of jobs. For more information see here.
Trigger a new job run: Runs a job and returns the run_id of the triggered run. For more information see here.
Get a single Job run: Returns metadata about a run, including run status (e.g., RUNNING, SUCCESS, FAILED), start and end time, execution durations, cluster information, etc. For more information see here.
Cancel a Job run: Cancels a job run or a task run. For more information, see here.
Get the output for a single job run: Retrieves the output and metadata of a single task run. For more information, see here.

Use Azure Databricks as a knowledge source in Copilot Studio

To add your Azure Databricks data as a knowledge source to a Copilot Studio agent, do the following:

From the sidebar, click Agent.
Select an existing agent or create a new agent by clicking + New agent.
- Describe the agent by inputting a message and then click Create.
- Or, click Skip to manually specify the agent's information.
In the Knowledge tab, click + Knowledge.
Click Advanced.
Select Azure Databricks as the knowledge source.
Input the catalog name your data is in.
Click Connect.
Select the tables you want your agent to use as a knowledge source and click Add.

Create Dataverse virtual tables with your Azure Databricks data

You can also create Dataverse virtual tables with the Azure Databricks connector. Virtual tables, also known as virtual entities, integrate data from external systems with Microsoft Dataverse. A virtual table defines a table in Dataverse without storing the physical table in the Dataverse database. To learn more about virtual tables, see Get started with virtual tables (entities).

Note

Although virtual tables do not consume Dataverse storage capacity, Databricks recommends you to use direct connections for better performance.

You must have the System Customizer or System Admin role. For more information, see security roles for Power Platform.

Follow these steps to create a Dataverse virtual table:

In Power Apps, from the sidebar, click Tables.
Click + New Table from the menu bar and select Create a virtual table.
Select an existing Azure Databricks connection or create a new connection to Azure Databricks. To add a new connection, see Step 1: Add an Azure Databricks connection to Power Platform.

Databricks recommends to use a service principal connection to create a virtual table.
Click Next.
Select the tables to represent as a Dataverse virtual table.
- Dataverse virtual tables require a primary key. Therefore, views cannot be virtual tables, but materialized views can.
Click Next.
Configure the virtual table by updating the details of the table, if necessary.
Click Next.
Confirm the details of the data source and click Finish.
Use the Dataverse virtual table in Power Apps, Power Automate, and Copilot Studio.

For a list of known limitations of Dataverse virtual tables, see Known limitations and troubleshooting.

Conduct batch updates

If you need to perform bulk create, update, or delete operations in response to Power Apps inputs, Databricks recommends to implement a Power Automate flow. To accomplish this, do the following:

Create a canvas app using your Azure Databricks connection in Power Apps.
Create a Power Automate flow using the Azure Databricks connection and use Power Apps as the trigger.
In the Power Automate trigger, add the input fields that you want to pass from Power Apps to Power Automate.
Create a collection object within Power Apps to collect all of your changes.
Add the Power Automate flow to your canvas app.
Call the Power Automate flow from your canvas app and iterate over the collection using a ForAll command.
```
ForAll(collectionName, FlowName.Run(input field 1, input field 2, input field 3, …)
```

Concurrent writes

Row-level concurrency reduces conflicts between concurrent write operations by detecting changes at the row-level and automatically resolving conflicts that occur when concurrent writes update or delete different rows in the same data file.

Row-level concurrency is included in Databricks Runtime 14.2 or above. Row-level concurrency is supported by default for the following types of tables:

Tables with deletion vectors enabled and without partitioning
Tables with liquid clustering, unless deletion vectors are disabled

To enable deletion vectors, run the following SQL command:

ALTER TABLE table_name SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);

For more information about concurrent write conflicts in Azure Databricks, see Isolation levels and write conflicts on Azure Databricks.

Add Azure Databricks to a data policy

By adding Azure Databricks to a Business data policy, Azure Databricks can't share data with connectors in other groups. This protects your data and prevents it from being shared with those who should not have access to it. For more information, see Manage data policies.

To add the Azure Databricks connector to a Power Platform data policy:

From any Power Platform application, click the settings gear in the upper-right side, and select Admin Center.
From the sidebar, click Policies > Data Policies.
If you are using the new admin center, click Security > Data and Privacy > Data Policy.
Click + New Policy or select an existing policy.
If creating a new policy, enter a name.
Select an environment to add to your policy and click + Add to policy above.
Click Next.
Search for and select the Azure Databricks connector.
Click Move to Business and click Next.
Review your policy and click Create policy.

Limitations

The Power Platform connector does not support government clouds.

Power App limitations

The following PowerFx formulas calculate values using only the data that has been retrieved locally:

Category	Formula
Table function	- GroupBy - Distinct
Aggregation	- CountRows - StdevP - StdevS

Creating a connection

The connector supports the following authentication types:


OAuth Connection	OAuth Connection	All regions	Not shareable
Service Principal Connection	Service Principal Connection	All regions	Shareable
Default [DEPRECATED]	This option is only for older connections without an explicit authentication type, and is only provided for backward compatibility.	All regions	Not shareable

OAuth Connection

Auth ID: oauth2-auth

Applicable: All regions

OAuth Connection

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Name	Type	Description	Required
Server Hostname (Example: adb-3980263885549757139.2.azuredatabricks.net)	string	Server name of Databricks workspace	True
HTTP Path (Example: /sql/1.0/warehouses/a9c4e781bd29f315)	string	HTTP Path of Databricks SQL Warehouse	True

Service Principal Connection

Auth ID: oAuthClientCredentials

Applicable: All regions

Service Principal Connection

This is shareable connection. If the power app is shared with another user, connection is shared as well. For more information, please see the Connectors overview for canvas apps - Power Apps | Microsoft Docs

Name	Type	Description	Required
Client ID	string		True
Client Secret	securestring		True
Tenant	string		True
Server Hostname (Example: adb-3980263885549757139.2.azuredatabricks.net)	string	Server name of Databricks workspace	True
HTTP Path (Example: /sql/1.0/warehouses/a9c4e781bd29f315)	string	HTTP Path of Databricks SQL Warehouse	True

Default [DEPRECATED]

Applicable: All regions

This option is only for older connections without an explicit authentication type, and is only provided for backward compatibility.

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Throttling Limits

Name	Calls	Renewal Period
API calls per connection	100	60 seconds

Actions

Azure Databricks Genie	Query Genie spaces to get insights from your data.
Cancel a run	Cancels a job run or a task run. The run is canceled asynchronously, so it may still be running when this request completes.
Cancel statement execution	Requests that an executing statement be canceled. Callers must poll for status to see the terminal state.
Check status and get results	Get the status, manifest and results of the statement
Execute a SQL statement	Execute a SQL statement and optionally await its results for a specified time.
Get a single job run	Retrieves the metadata of a run. Large arrays in the results will be paginated when they exceed 100 elements. A request for a single run will return all properties for that run, and the first 100 elements of array properties (tasks, job_clusters, job_parameters and repair_history). Use the next_page_token field to check for more results and pass its value as the page_token in subsequent requests. If any array properties have more than 100 elements, additional results will be returned on subsequent requests. Arrays without additional results will be empty on later pages.
Get result by chunk index	After the statement execution has SUCCEEDED, this request can be used to fetch any chunk by index.
Get the output for a single run	Retrieve the output and metadata of a single task run. When a notebook task returns a value through the dbutils.notebook.exit() call, you can use this endpoint to retrieve that value. Azure Databricks restricts this API to returning the first 5 MB of the output. To return a larger result, you can store job results in a cloud storage service. This endpoint validates that the run_id parameter is valid and returns an HTTP status code 400 if the run_id parameter is invalid. Runs are automatically removed after 60 days. If you to want to reference them beyond 60 days, you must save old run results before they expire.
List jobs	Retrieves a list of jobs.
Trigger a new job run	Run a job and return the run_id of the triggered run.

Azure Databricks Genie

Operation ID:: InvokeGenieMCP

Query Genie spaces to get insights from your data.

Parameters

Name	Key	Required	Type	Description
Genie Space ID	genie_space_id	True	string	Genie Space ID

Cancel a run

Operation ID:: JobsCancelRun

Cancels a job run or a task run. The run is canceled asynchronously, so it may still be running when this request completes.

Parameters

Name	Key	Required	Type	Description
run_id	run_id	True	integer	This field is required.

Cancel statement execution

Operation ID:: SqlCancelExecution

Requests that an executing statement be canceled. Callers must poll for status to see the terminal state.

Parameters

Name	Key	Required	Type	Description
Statement ID	statement_id	True	string	Statement ID

Check status and get results

Operation ID:: SqlGetStatement

Get the status, manifest and results of the statement

Parameters

Name	Key	Required	Type	Description
Statement ID	statement_id	True	string	Statement ID

Returns

Statement execution response

Body: SqlStatementResponse

Execute a SQL statement

Operation ID:: SqlExecuteStatement

Execute a SQL statement and optionally await its results for a specified time.

Parameters

Name	Key	Required	Type	Description
warehouse_id	warehouse_id	True	string	Target warehouse ID
statement	statement	True	string	The SQL statement to execute. The statement can optionally be parameterized, see parameters
name	name	True	string	Parameter marker name
type	type		string	Parameter data type
value	value		string	Parameter value
catalog	catalog		string	Default catalog for execution
schema	schema		string	Default schema for execution
disposition	disposition		string	Result fetching mode
format	format		string	Result set format
on_wait_timeout	on_wait_timeout		string	Action on timeout
wait_timeout	wait_timeout		string	Result wait timeout
byte_limit	byte_limit		integer	Result byte limit
row_limit	row_limit		integer	Result row limit

Returns

Statement execution response

Body: SqlStatementResponse

Get a single job run

Operation ID:: JobsGetRun

Retrieves the metadata of a run. Large arrays in the results will be paginated when they exceed 100 elements. A request for a single run will return all properties for that run, and the first 100 elements of array properties (tasks, job_clusters, job_parameters and repair_history). Use the next_page_token field to check for more results and pass its value as the page_token in subsequent requests. If any array properties have more than 100 elements, additional results will be returned on subsequent requests. Arrays without additional results will be empty on later pages.

Parameters

Name	Key	Required	Type	Description
Run ID	run_id	True	integer	The canonical identifier of the run for which to retrieve the metadata. This field is required.
Include History	include_history		boolean	Whether to include the repair history in the response.
Include Resolved Values	include_resolved_values		boolean	Whether to include resolved parameter values in the response.
Page Token	page_token		string	Use next_page_token returned from the previous GetRun response to request the next page of the run's array properties.

Returns

Body: JobsRun

Get result by chunk index

Operation ID:: SqlGetStatementResultChunkN

After the statement execution has SUCCEEDED, this request can be used to fetch any chunk by index.

Parameters

Name	Key	Required	Type	Description
Statement ID	statement_id	True	string	Statement ID
Chunk index	chunk_index	True	string	Chunk index

Returns

Body: SqlResultData

Get the output for a single run

Operation ID:: JobsGetRunOutput

Retrieve the output and metadata of a single task run. When a notebook task returns a value through the dbutils.notebook.exit() call, you can use this endpoint to retrieve that value. Azure Databricks restricts this API to returning the first 5 MB of the output. To return a larger result, you can store job results in a cloud storage service. This endpoint validates that the run_id parameter is valid and returns an HTTP status code 400 if the run_id parameter is invalid. Runs are automatically removed after 60 days. If you to want to reference them beyond 60 days, you must save old run results before they expire.

Parameters

Name	Key	Required	Type	Description
Run ID	run_id	True	integer	The canonical identifier for the run.

Returns

Body: JobsRunOutput

List jobs

Operation ID:: JobsListJobs

Retrieves a list of jobs.

Parameters

Name	Key	Type	Description
Limit	limit	integer	The number of jobs to return. This value must be greater than 0 and less or equal to 100. The default value is 20.
Expand Tasks	expand_tasks	boolean	Whether to include task and cluster details in the response. Note that only the first 100 elements will be shown. Use :method:jobs/get to paginate through all tasks and clusters.
Job Name	name	string	A filter on the list based on the exact (case insensitive) job name.
Page Token	page_token	string	Use next_page_token or prev_page_token returned from the previous request to list the next or previous page of jobs respectively.

Returns

Body: JobsListJobsResponse

Trigger a new job run

Operation ID:: JobsRunNow

Run a job and return the run_id of the triggered run.

Parameters

Name	Key	Required	Type	Description
idempotency_token	idempotency_token		string	An optional token to guarantee the idempotency of job run requests. If a run with the provided token already exists, the request does not create a new run but returns the ID of the existing run instead. If a run with the provided token is deleted, an error is returned. If you specify the idempotency token, upon failure you can retry until the request succeeds. Azure Databricks guarantees that exactly one run is launched with that idempotency token. This token must have at most 64 characters. For more information, see How to ensure idempotency for jobs.
job_id	job_id	True	integer	The ID of the job to be executed
job_parameters	job_parameters		object	Job-level parameters used in the run. for example "param": "overriding_val"
only	only		array of string	A list of task keys to run inside of the job. If this field is not provided, all tasks in the job will be run.
performance_target	performance_target		string
full_refresh	full_refresh		boolean	If true, triggers a full refresh on the delta live table.
enabled	enabled	True	boolean	If true, enable queueing for the job. This is a required field.

Returns

Body: JobsRunNowResponse

Definitions

Object

SqlBaseChunkInfo

Metadata for a result set chunk

Name	Path	Type	Description
byte_count	byte_count	integer	Number of bytes in the result chunk
chunk_index	chunk_index	integer	Position in the sequence of result set chunks
row_count	row_count	integer	Number of rows in the result chunk
row_offset	row_offset	integer	Starting row offset in the result set

SqlColumnInfo

Name	Path	Type	Description
name	name	string	Column name
position	position	integer	Column position (0-based)
type_interval_type	type_interval_type	string	Interval type format
type_name	type_name	SqlColumnInfoTypeName	The name of the base data type. This doesn't include details for complex types such as STRUCT, MAP or ARRAY.
type_precision	type_precision	integer	Number of digits for DECIMAL type
type_scale	type_scale	integer	Number of decimal places for DECIMAL type
type_text	type_text	string	Full SQL type specification

SqlColumnInfoTypeName

The name of the base data type. This doesn't include details for complex types such as STRUCT, MAP or ARRAY.

: string

SqlStatementResponse

Statement execution response

Name	Path	Type	Description
manifest	manifest	SqlResultManifest	Result set schema and metadata
result	result	SqlResultData
statement_id	statement_id	string	Statement ID
status	status	SqlStatementStatus	Statement execution status

SqlResultManifest

Result set schema and metadata

Name	Path	Type	Description
chunks	chunks	array of SqlBaseChunkInfo	Result chunk metadata
format	format	string
schema	schema	SqlResultSchema	Result set column definitions
total_byte_count	total_byte_count	integer	Total bytes in result set
total_chunk_count	total_chunk_count	integer	Total number of chunks
total_row_count	total_row_count	integer	Total number of rows
truncated	truncated	boolean	Result truncation status

SqlStatementStatus

Statement execution status

Name	Path	Type	Description
error	error	SqlServiceError
state	state	SqlStatementState	Statement execution state

SqlStatementState

Statement execution state

: string

SqlServiceError

Name	Path	Type	Description
error_code	error_code	string
message	message	string	Error message

SqlResultSchema

Result set column definitions

Name	Path	Type	Description
column_count	column_count	integer
columns	columns	array of SqlColumnInfo

SqlResultData

Name	Path	Type	Description
byte_count	byte_count	integer	Bytes in result chunk
chunk_index	chunk_index	integer	Chunk position
data_array	data_array	SqlJsonArray	Array of arrays with string values
external_links	external_links	array of SqlExternalLink
next_chunk_index	next_chunk_index	integer	Next chunk index
next_chunk_internal_link	next_chunk_internal_link	string	Next chunk link
row_count	row_count	integer	Rows in chunk
row_offset	row_offset	integer	Starting row offset

SqlJsonArray

Array of arrays with string values

Name	Path	Type	Description
Items		array of

SqlExternalLink

Name	Path	Type	Description
byte_count	byte_count	integer	Bytes in chunk
chunk_index	chunk_index	integer	Chunk position
expiration	expiration	date-time	Link expiration time
external_link	external_link	string
http_headers	http_headers	object	Required HTTP headers
next_chunk_index	next_chunk_index	integer	Next chunk index
next_chunk_internal_link	next_chunk_internal_link	string	Next chunk link
row_count	row_count	integer	Rows in chunk
row_offset	row_offset	integer	Starting row offset

JobsRunNowResponse

Name	Path	Type	Description
run_id	run_id	integer	The globally unique ID of the newly triggered run.

JobsPerformanceTarget

: string

JobsPipelineParams

Name	Path	Type	Description
full_refresh	full_refresh	boolean	If true, triggers a full refresh on the delta live table.

JobsQueueSettings

Name	Path	Type	Description
enabled	enabled	boolean	If true, enable queueing for the job. This is a required field.

JobsListJobsResponse

Name	Path	Type	Description
jobs	jobs	array of JobsBaseJob	The list of jobs. Only included in the response if there are jobs to list.
next_page_token	next_page_token	string	A token that can be used to list the next page of jobs (if applicable).
prev_page_token	prev_page_token	string	A token that can be used to list the previous page of jobs (if applicable).

JobsBaseJob

Name	Path	Type	Description
created_time	created_time	integer	The time at which this job was created in epoch milliseconds (milliseconds since 1/1/1970 UTC).
creator_user_name	creator_user_name	string	The creator user name. This field won’t be included in the response if the user has already been deleted.
effective_budget_policy_id	effective_budget_policy_id	uuid	The id of the budget policy used by this job for cost attribution purposes. This may be set through (in order of precedence): 1. Budget admins through the account or workspace console 2. Jobs UI in the job details page and Jobs API using budget_policy_id 3. Inferred default based on accessible budget policies of the run_as identity on job creation or modification.
has_more	has_more	boolean	Indicates if the job has more array properties (tasks, job_clusters) that are not shown. They can be accessed via :method:jobs/get endpoint. It is only relevant for API 2.2 :method:jobs/list requests with expand_tasks=true.
job_id	job_id	integer	The canonical identifier for this job.
settings	settings	JobsJobSettings
trigger_state	trigger_state	JobsTriggerStateProto

JobsJobSettings

Name	Path	Type	Description
budget_policy_id	budget_policy_id	uuid	The id of the user specified budget policy to use for this job. If not specified, a default budget policy may be applied when creating or modifying the job. See effective_budget_policy_id for the budget policy used by this workload.
continuous	continuous	JobsContinuous
deployment	deployment	JobsJobDeployment
description	description	string	An optional description for the job. The maximum length is 27700 characters in UTF-8 encoding.
edit_mode	edit_mode	JobsJobEditMode
email_notifications	email_notifications	JobsJobEmailNotifications
environments	environments	array of JobsJobEnvironment	A list of task execution environment specifications that can be referenced by serverless tasks of this job. An environment is required to be present for serverless tasks. For serverless notebook tasks, the environment is accessible in the notebook environment panel. For other serverless tasks, the task environment is required to be specified using environment_key in the task settings.
git_source	git_source	JobsGitSource
health	health	JobsJobsHealthRules
job_clusters	job_clusters	array of JobsJobCluster	A list of job cluster specifications that can be shared and reused by tasks of this job. Libraries cannot be declared in a shared job cluster. You must declare dependent libraries in task settings.
max_concurrent_runs	max_concurrent_runs	integer	An optional maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently. This is useful for example if you trigger your job on a frequent schedule and want to allow consecutive runs to overlap with each other, or if you want to trigger multiple runs which differ by their input parameters. This setting affects only new runs. For example, suppose the job’s concurrency is 4 and there are 4 concurrent active runs. Then setting the concurrency to 3 won’t kill any of the active runs. However, from then on, new runs are skipped unless there are fewer than 3 active runs. This value cannot exceed 1000. Setting this value to 0 causes all new runs to be skipped.
name	name	string	An optional name for the job. The maximum length is 4096 bytes in UTF-8 encoding.
notification_settings	notification_settings	JobsJobNotificationSettings
parameters	parameters	array of JobsJobParameterDefinition	Job-level parameter definitions
performance_target	performance_target	JobsPerformanceTarget
queue	queue	JobsQueueSettings
run_as	run_as	JobsJobRunAs
schedule	schedule	JobsCronSchedule
tags	tags	object	A map of tags associated with the job. These are forwarded to the cluster as cluster tags for jobs clusters, and are subject to the same limitations as cluster tags. A maximum of 25 tags can be added to the job.
tasks	tasks	array of JobsTask	A list of task specifications to be executed by this job. It supports up to 1000 elements in write endpoints (:method:jobs/create, :method:jobs/reset, :method:jobs/update, :method:jobs/submit). Read endpoints return only 100 tasks. If more than 100 tasks are available, you can paginate through them using :method:jobs/get. Use the next_page_token field at the object root to determine if more results are available.
timeout_seconds	timeout_seconds	integer	An optional timeout applied to each run of this job. A value of 0 means no timeout.
trigger	trigger	JobsTriggerSettings
webhook_notifications	webhook_notifications	JobsWebhookNotifications

JobsContinuous

Name	Path	Type	Description
pause_status	pause_status	JobsPauseStatus

JobsPauseStatus

: string

JobsJobDeployment

Name	Path	Type	Description
kind	kind	JobsJobDeploymentKind
metadata_file_path	metadata_file_path	string	Path of the file that contains deployment metadata.

JobsJobDeploymentKind

: string

JobsJobEditMode

: string

JobsJobEmailNotifications

Name	Path	Type	Description
on_duration_warning_threshold_exceeded	on_duration_warning_threshold_exceeded	array of string	A list of email addresses to be notified when the duration of a run exceeds the threshold specified for the RUN_DURATION_SECONDS metric in the health field. If no rule for the RUN_DURATION_SECONDS metric is specified in the health field for the job, notifications are not sent.
on_failure	on_failure	array of string	A list of email addresses to be notified when a run unsuccessfully completes. A run is considered to have completed unsuccessfully if it ends with an INTERNAL_ERROR life_cycle_state or a FAILED, or TIMED_OUT result_state. If this is not specified on job creation, reset, or update the list is empty, and notifications are not sent.
on_start	on_start	array of string	A list of email addresses to be notified when a run begins. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent.
on_streaming_backlog_exceeded	on_streaming_backlog_exceeded	array of string	A list of email addresses to notify when any streaming backlog thresholds are exceeded for any stream. Streaming backlog thresholds can be set in the health field using the following metrics: STREAMING_BACKLOG_BYTES, STREAMING_BACKLOG_RECORDS, STREAMING_BACKLOG_SECONDS, or STREAMING_BACKLOG_FILES. Alerting is based on the 10-minute average of these metrics. If the issue persists, notifications are resent every 30 minutes.
on_success	on_success	array of string	A list of email addresses to be notified when a run successfully completes. A run is considered to have completed successfully if it ends with a TERMINATED life_cycle_state and a SUCCESS result_state. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent.

JobsJobEnvironment

Name	Path	Type	Description
environment_key	environment_key	string	The key of an environment. It has to be unique within a job.
spec	spec	ComputeEnvironment

ComputeEnvironment

Name	Path	Type	Description
dependencies	dependencies	array of string	List of pip dependencies, as supported by the version of pip in this environment. Each dependency is a valid pip requirements file line per https://pip.pypa.io/en/stable/reference/requirements-file-format/. Allowed dependencies include a requirement specifier, an archive URL, a local project path (such as WSFS or UC Volumes in Azure Databricks), or a VCS project URL.
environment_version	environment_version	string	Required. Environment version used by the environment. Each version comes with a specific Python version and a set of Python packages. The version is a string, consisting of an integer. See https://learn.microsoft.com/azure/databricks/release-notes/serverless/#serverless-environment-versions.

JobsGitSource

Name	Path	Type	Description
git_branch	git_branch	string	Name of the branch to be checked out and used by this job. This field cannot be specified in conjunction with git_tag or git_commit.
git_commit	git_commit	string	Commit to be checked out and used by this job. This field cannot be specified in conjunction with git_branch or git_tag.
git_provider	git_provider	JobsGitProvider
git_snapshot	git_snapshot	JobsGitSnapshot
git_tag	git_tag	string	Name of the tag to be checked out and used by this job. This field cannot be specified in conjunction with git_branch or git_commit.
git_url	git_url	string	URL of the repository to be cloned by this job.

JobsGitProvider

: string

JobsGitSnapshot

Name	Path	Type	Description
used_commit	used_commit	string	Commit that was used to execute the run. If git_branch was specified, this points to the HEAD of the branch at the time of the run; if git_tag was specified, this points to the commit the tag points to.

JobsJobsHealthRules

Name	Path	Type	Description
rules	rules	array of JobsJobsHealthRule

JobsJobsHealthRule

Name	Path	Type	Description
metric	metric	JobsJobsHealthMetric
op	op	JobsJobsHealthOperator
value	value	integer	Specifies the threshold value that the health metric should obey to satisfy the health rule.

JobsJobsHealthMetric

: string

JobsJobsHealthOperator

: string

JobsJobCluster

Name	Path	Type	Description
job_cluster_key	job_cluster_key	string	A unique name for the job cluster. This field is required and must be unique within the job. JobTaskSettings may refer to this field to determine which cluster to launch for the task execution.
new_cluster	new_cluster	ComputeClusterSpec

ComputeClusterSpec

Name	Path	Type	Description
apply_policy_default_values	apply_policy_default_values	boolean	When set to true, fixed and default values from the policy will be used for fields that are omitted. When set to false, only fixed values from the policy will be applied.
autoscale	autoscale	ComputeAutoScale
autotermination_minutes	autotermination_minutes	integer	Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. Users can also set this value to 0 to explicitly disable automatic termination.
azure_attributes	azure_attributes	ComputeAzureAttributes
cluster_log_conf	cluster_log_conf	ComputeClusterLogConf
cluster_name	cluster_name	string	Cluster name requested by the user. This doesn't have to be unique. If not specified at creation, the cluster name will be an empty string. For job clusters, the cluster name is automatically set based on the job and job run IDs.
custom_tags	custom_tags	object	Additional tags for cluster resources. Azure Databricks will tag all cluster resources (e.g., AWS instances and EBS volumes) with these tags in addition to default_tags. Notes: - Currently, Azure Databricks allows at most 45 custom tags - Clusters can only reuse cloud resources if the resources' tags are a subset of the cluster tags
data_security_mode	data_security_mode	ComputeDataSecurityMode
docker_image	docker_image	ComputeDockerImage
driver_instance_pool_id	driver_instance_pool_id	string	The optional ID of the instance pool for the driver of the cluster belongs. The pool cluster uses the instance pool with id (instance_pool_id) if the driver pool is not assigned.
driver_node_type_id	driver_node_type_id	string	The node type of the Spark driver. Note that this field is optional; if unset, the driver node type will be set as the same value as node_type_id defined above. This field, along with node_type_id, should not be set if virtual_cluster_size is set. If both driver_node_type_id, node_type_id, and virtual_cluster_size are specified, driver_node_type_id and node_type_id take precedence.
enable_elastic_disk	enable_elastic_disk	boolean	Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly - refer to the User Guide for more details.
enable_local_disk_encryption	enable_local_disk_encryption	boolean	Whether to enable LUKS on cluster VMs' local disks
init_scripts	init_scripts	array of ComputeInitScriptInfo	The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. If cluster_log_conf is specified, init script logs are sent to <destination>/<cluster-ID>/init_scripts.
instance_pool_id	instance_pool_id	string	The optional ID of the instance pool to which the cluster belongs.
is_single_node	is_single_node	boolean	This field can only be used when kind = CLASSIC_PREVIEW. When set to true, Azure Databricks will automatically set single node related custom_tags, spark_conf, and num_workers
kind	kind	ComputeKind
node_type_id	node_type_id	string	This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call.
num_workers	num_workers	integer	Number of worker nodes that this cluster should have. A cluster has one Spark Driver and num_workers Executors for a total of num_workers + 1 Spark nodes. Note: When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual current number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in spark_info will gradually increase from 5 to 10 as the new nodes are provisioned.
policy_id	policy_id	string	The ID of the cluster policy used to create the cluster if applicable.
runtime_engine	runtime_engine	ComputeRuntimeEngine
single_user_name	single_user_name	string	Single user name if data_security_mode is SINGLE_USER
spark_conf	spark_conf	object	An object containing a set of optional, user-specified Spark configuration key-value pairs. Users can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively.
spark_env_vars	spark_env_vars	object	An object containing a set of optional, user-specified environment variable key-value pairs. Please note that key-value pair of the form (X,Y) will be exported as is (i.e., export X='Y') while launching the driver and workers. In order to specify an additional set of SPARK_DAEMON_JAVA_OPTS, we recommend appending them to $SPARK_DAEMON_JAVA_OPTS as shown in the example below. This ensures that all default databricks managed environmental variables are included as well. Example Spark environment variables: {"SPARK_WORKER_MEMORY": "28000m", "SPARK_LOCAL_DIRS": "/local_disk0"} or {"SPARK_DAEMON_JAVA_OPTS": "$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true"}
spark_version	spark_version	string	The Spark version of the cluster, e.g. 3.3.x-scala2.11. A list of available Spark versions can be retrieved by using the :method:clusters/sparkVersions API call.
ssh_public_keys	ssh_public_keys	array of string	SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.
use_ml_runtime	use_ml_runtime	boolean	This field can only be used when kind = CLASSIC_PREVIEW. effective_spark_version is determined by spark_version (DBR release), this field use_ml_runtime, and whether node_type_id is gpu node or not.
workload_type	workload_type	ComputeWorkloadType

ComputeAutoScale

Name	Path	Type	Description
max_workers	max_workers	integer	The maximum number of workers to which the cluster can scale up when overloaded. Note that max_workers must be strictly greater than min_workers.
min_workers	min_workers	integer	The minimum number of workers to which the cluster can scale down when underutilized. It is also the initial number of workers the cluster will have after creation.

ComputeAzureAttributes

Name	Path	Type	Description
availability	availability	ComputeAzureAvailability
first_on_demand	first_on_demand	integer	The first first_on_demand nodes of the cluster will be placed on on-demand instances. This value should be greater than 0, to make sure the cluster driver node is placed on an on-demand instance. If this value is greater than or equal to the current cluster size, all nodes will be placed on on-demand instances. If this value is less than the current cluster size, first_on_demand nodes will be placed on on-demand instances and the remainder will be placed on availability instances. Note that this value does not affect cluster size and cannot currently be mutated over the lifetime of a cluster.
log_analytics_info	log_analytics_info	ComputeLogAnalyticsInfo
spot_bid_max_price	spot_bid_max_price	double	The max bid price to be used for Azure spot instances. The Max price for the bid cannot be higher than the on-demand price of the instance. If not specified, the default value is -1, which specifies that the instance cannot be evicted on the basis of price, and only on the basis of availability. Further, the value should > 0 or -1.

ComputeAzureAvailability

: string

ComputeLogAnalyticsInfo

Name	Path	Type	Description
log_analytics_primary_key	log_analytics_primary_key	string
log_analytics_workspace_id	log_analytics_workspace_id	string

ComputeClusterLogConf

Name	Path	Type	Description
dbfs	dbfs	ComputeDbfsStorageInfo
volumes	volumes	ComputeVolumesStorageInfo

ComputeDbfsStorageInfo

Name	Path	Type	Description
destination	destination	string	dbfs destination, e.g. dbfs:/my/path

ComputeVolumesStorageInfo

Name	Path	Type	Description
destination	destination	string	UC Volumes destination, e.g. /Volumes/catalog/schema/vol1/init-scripts/setup-datadog.sh or dbfs:/Volumes/catalog/schema/vol1/init-scripts/setup-datadog.sh

ComputeDataSecurityMode

: string

ComputeDockerImage

Name	Path	Type	Description
basic_auth	basic_auth	ComputeDockerBasicAuth
url	url	string	URL of the docker image.

ComputeDockerBasicAuth

Name	Path	Type	Description
password	password	string	Password of the user
username	username	string	Name of the user

ComputeInitScriptInfo

Name	Path	Type
abfss	abfss	ComputeAdlsgen2Info
file	file	ComputeLocalFileInfo
gcs	gcs	ComputeGcsStorageInfo
volumes	volumes	ComputeVolumesStorageInfo
workspace	workspace	ComputeWorkspaceStorageInfo

ComputeAdlsgen2Info

Name	Path	Type	Description
destination	destination	string	abfss destination, e.g. abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<directory-name>.

ComputeLocalFileInfo

Name	Path	Type	Description
destination	destination	string	local file destination, e.g. file:/my/local/file.sh

ComputeGcsStorageInfo

Name	Path	Type	Description
destination	destination	string	GCS destination/URI, e.g. gs://my-bucket/some-prefix

ComputeWorkspaceStorageInfo

Name	Path	Type	Description
destination	destination	string	wsfs destination, e.g. workspace:/cluster-init-scripts/setup-datadog.sh

ComputeKind

: string

ComputeRuntimeEngine

: string

ComputeWorkloadType

Name	Path	Type	Description
clients	clients	ComputeClientsTypes

ComputeClientsTypes

Name	Path	Type	Description
jobs	jobs	boolean	With jobs set, the cluster can be used for jobs
notebooks	notebooks	boolean	With notebooks set, this cluster can be used for notebooks

JobsJobNotificationSettings

Name	Path	Type	Description
no_alert_for_canceled_runs	no_alert_for_canceled_runs	boolean	If true, do not send notifications to recipients specified in on_failure if the run is canceled.
no_alert_for_skipped_runs	no_alert_for_skipped_runs	boolean	If true, do not send notifications to recipients specified in on_failure if the run is skipped.

JobsJobParameterDefinition

Name	Path	Type	Description
default	default	string	Default value of the parameter.
name	name	string	The name of the defined parameter. May only contain alphanumeric characters, _, -, and .

JobsJobRunAs

Name	Path	Type	Description
service_principal_name	service_principal_name	string	Application ID of an active service principal. Setting this field requires the servicePrincipal/user role.
user_name	user_name	string	The email of an active workspace user. Non-admin users can only set this field to their own email.

JobsCronSchedule

Name	Path	Type	Description
pause_status	pause_status	JobsPauseStatus
quartz_cron_expression	quartz_cron_expression	string	A Cron expression using Quartz syntax that describes the schedule for a job. See Cron Trigger for details. This field is required.
timezone_id	timezone_id	string	A Java timezone ID. The schedule for a job is resolved with respect to this timezone. See Java TimeZone for details. This field is required.

JobsTask

Name	Path	Type	Description
clean_rooms_notebook_task	clean_rooms_notebook_task	Object
condition_task	condition_task	JobsConditionTask
dashboard_task	dashboard_task	JobsDashboardTask
dbt_task	dbt_task	Object
depends_on	depends_on	array of JobsTaskDependency	An optional array of objects specifying the dependency graph of the task. All tasks specified in this field must complete before executing this task. The task will run only if the run_if condition is true. The key is task_key, and the value is the name assigned to the dependent task.
description	description	string	An optional description for this task.
disable_auto_optimization	disable_auto_optimization	boolean	An option to disable auto optimization in serverless
email_notifications	email_notifications	JobsTaskEmailNotifications
environment_key	environment_key	string	The key that references an environment spec in a job. This field is required for Python script, Python wheel and dbt tasks when using serverless compute.
existing_cluster_id	existing_cluster_id	string	If existing_cluster_id, the ID of an existing cluster that is used for all runs. When running jobs or tasks on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs and tasks on new clusters for greater reliability
for_each_task	for_each_task	JobsForEachTask
health	health	JobsJobsHealthRules
job_cluster_key	job_cluster_key	string	If job_cluster_key, this task is executed reusing the cluster specified in job.settings.job_clusters.
libraries	libraries	array of ComputeLibrary	An optional list of libraries to be installed on the cluster. The default value is an empty list.
max_retries	max_retries	integer	An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it completes with the FAILED result_state or INTERNAL_ERROR life_cycle_state. The value -1 means to retry indefinitely and the value 0 means to never retry.
min_retry_interval_millis	min_retry_interval_millis	integer	An optional minimal interval in milliseconds between the start of the failed run and the subsequent retry run. The default behavior is that unsuccessful runs are immediately retried.
new_cluster	new_cluster	ComputeClusterSpec
notebook_task	notebook_task	JobsNotebookTask
notification_settings	notification_settings	JobsTaskNotificationSettings
pipeline_task	pipeline_task	JobsPipelineTask
power_bi_task	power_bi_task	Object
python_wheel_task	python_wheel_task	JobsPythonWheelTask
retry_on_timeout	retry_on_timeout	boolean	An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout.
run_if	run_if	JobsRunIf
run_job_task	run_job_task	JobsRunJobTask
spark_jar_task	spark_jar_task	JobsSparkJarTask
spark_python_task	spark_python_task	JobsSparkPythonTask
spark_submit_task	spark_submit_task	JobsSparkSubmitTask
sql_task	sql_task	Object
task_key	task_key	string	A unique name for the task. This field is used to refer to this task from other tasks. This field is required and must be unique within its parent job. On Update or Reset, this field is used to reference the tasks to be updated or reset.
timeout_seconds	timeout_seconds	integer	An optional timeout applied to each run of this job task. A value of 0 means no timeout.
webhook_notifications	webhook_notifications	JobsWebhookNotifications

JobsConditionTask

Name	Path	Type	Description
left	left	string	The left operand of the condition task. Can be either a string value or a job state or parameter reference.
op	op	JobsConditionTaskOp
right	right	string	The right operand of the condition task. Can be either a string value or a job state or parameter reference.

JobsConditionTaskOp

: string

JobsDashboardTask

Name	Path	Type	Description
dashboard_id	dashboard_id	string	The identifier of the dashboard to refresh.
subscription	subscription	JobsSubscription
warehouse_id	warehouse_id	string	Optional: The warehouse id to execute the dashboard with for the schedule. If not specified, the default warehouse of the dashboard will be used.

JobsSubscription

Name	Path	Type	Description
custom_subject	custom_subject	string	Optional: Allows users to specify a custom subject line on the email sent to subscribers.
paused	paused	boolean	When true, the subscription will not send emails.
subscribers	subscribers	array of JobsSubscriptionSubscriber	The list of subscribers to send the snapshot of the dashboard to.

JobsSubscriptionSubscriber

Name	Path	Type	Description
destination_id	destination_id	string	A snapshot of the dashboard will be sent to the destination when the destination_id field is present.
user_name	user_name	string	A snapshot of the dashboard will be sent to the user's email when the user_name field is present.

JobsSource

: string

JobsTaskDependency

Name	Path	Type	Description
outcome	outcome	string	Can only be specified on condition task dependencies. The outcome of the dependent task that must be met for this task to run.
task_key	task_key	string	The name of the task this task depends on.

JobsTaskEmailNotifications

Name	Path	Type	Description
on_duration_warning_threshold_exceeded	on_duration_warning_threshold_exceeded	array of string	A list of email addresses to be notified when the duration of a run exceeds the threshold specified for the RUN_DURATION_SECONDS metric in the health field. If no rule for the RUN_DURATION_SECONDS metric is specified in the health field for the job, notifications are not sent.
on_failure	on_failure	array of string	A list of email addresses to be notified when a run unsuccessfully completes. A run is considered to have completed unsuccessfully if it ends with an INTERNAL_ERROR life_cycle_state or a FAILED, or TIMED_OUT result_state. If this is not specified on job creation, reset, or update the list is empty, and notifications are not sent.
on_start	on_start	array of string	A list of email addresses to be notified when a run begins. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent.
on_streaming_backlog_exceeded	on_streaming_backlog_exceeded	array of string	A list of email addresses to notify when any streaming backlog thresholds are exceeded for any stream. Streaming backlog thresholds can be set in the health field using the following metrics: STREAMING_BACKLOG_BYTES, STREAMING_BACKLOG_RECORDS, STREAMING_BACKLOG_SECONDS, or STREAMING_BACKLOG_FILES. Alerting is based on the 10-minute average of these metrics. If the issue persists, notifications are resent every 30 minutes.
on_success	on_success	array of string	A list of email addresses to be notified when a run successfully completes. A run is considered to have completed successfully if it ends with a TERMINATED life_cycle_state and a SUCCESS result_state. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent.

ComputeLibrary

Name	Path	Type	Description
cran	cran	ComputeRCranLibrary
jar	jar	string	URI of the JAR library to install. Supported URIs include Workspace paths, Unity Catalog Volumes paths, and ADLS URIs. For example: { "jar": "/Workspace/path/to/library.jar" }, { "jar" : "/Volumes/path/to/library.jar" } or { "jar": "abfss://my-bucket/library.jar" }. If ADLS is used, please make sure the cluster has read access on the library. You may need to launch the cluster with a Microsoft Entra ID service principal to access the ADLS URI.
maven	maven	ComputeMavenLibrary
pypi	pypi	ComputePythonPyPiLibrary
requirements	requirements	string	URI of the requirements.txt file to install. Only Workspace paths and Unity Catalog Volumes paths are supported. For example: { "requirements": "/Workspace/path/to/requirements.txt" } or { "requirements" : "/Volumes/path/to/requirements.txt" }
whl	whl	string	URI of the wheel library to install. Supported URIs include Workspace paths, Unity Catalog Volumes paths, and ADLS URIs. For example: { "whl": "/Workspace/path/to/library.whl" }, { "whl" : "/Volumes/path/to/library.whl" } or { "whl": "abfss://my-bucket/library.whl" }. If ADLS is used, please make sure the cluster has read access on the library. You may need to launch the cluster with a Microsoft Entra ID service principal to access the ADLS URI.

JobsForEachTask

Name	Path	Type	Description
concurrency	concurrency	integer	An optional maximum allowed number of concurrent runs of the task. Set this value if you want to be able to execute multiple runs of the task concurrently.
inputs	inputs	string	Array for task to iterate on. This can be a JSON string or a reference to an array parameter.
task	task	Object

ComputeRCranLibrary

Name	Path	Type	Description
package	package	string	The name of the CRAN package to install.
repo	repo	string	The repository where the package can be found. If not specified, the default CRAN repo is used.

ComputeMavenLibrary

Name	Path	Type	Description
coordinates	coordinates	string	Gradle-style maven coordinates. For example: "org.jsoup:jsoup:1.7.2".
exclusions	exclusions	array of string	List of dependences to exclude. For example: ["slf4j:slf4j", "*:hadoop-client"]. Maven dependency exclusions: https://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html.
repo	repo	string	Maven repo to install the Maven package from. If omitted, both Maven Central Repository and Spark Packages are searched.

ComputePythonPyPiLibrary

Name	Path	Type	Description
package	package	string	The name of the pypi package to install. An optional exact version specification is also supported. Examples: "simplejson" and "simplejson==3.8.0".
repo	repo	string	The repository where the package can be found. If not specified, the default pip index is used.

JobsNotebookTask

Name	Path	Type	Description
base_parameters	base_parameters	object	Base parameters to be used for each run of this job. If the run is initiated by a call to :method:jobs/run Now with parameters specified, the two parameters maps are merged. If the same key is specified in base_parameters and in run-now, the value from run-now is used. Use Task parameter variables to set parameters containing information about job runs. If the notebook takes a parameter that is not specified in the job’s base_parameters or the run-now override parameters, the default value from the notebook is used. Retrieve these parameters in a notebook using dbutils.widgets.get. The JSON representation of this field cannot exceed 1MB.
notebook_path	notebook_path	string	The path of the notebook to be run in the Azure Databricks workspace or remote repository. For notebooks stored in the Azure Databricks workspace, the path must be absolute and begin with a slash. For notebooks stored in a remote repository, the path must be relative. This field is required.
source	source	JobsSource
warehouse_id	warehouse_id	string	Optional warehouse_id to run the notebook on a SQL warehouse. Classic SQL warehouses are NOT supported, please use serverless or pro SQL warehouses. Note that SQL warehouses only support SQL cells; if the notebook contains non-SQL cells, the run will fail.

JobsTaskNotificationSettings

Name	Path	Type	Description
alert_on_last_attempt	alert_on_last_attempt	boolean	If true, do not send notifications to recipients specified in on_start for the retried runs and do not send notifications to recipients specified in on_failure until the last retry of the run.
no_alert_for_canceled_runs	no_alert_for_canceled_runs	boolean	If true, do not send notifications to recipients specified in on_failure if the run is canceled.
no_alert_for_skipped_runs	no_alert_for_skipped_runs	boolean	If true, do not send notifications to recipients specified in on_failure if the run is skipped.

JobsPipelineTask

Name	Path	Type	Description
full_refresh	full_refresh	boolean	If true, triggers a full refresh on the delta live table.
pipeline_id	pipeline_id	string	The full name of the pipeline task to execute.

JobsPythonWheelTask

Name	Path	Type	Description
entry_point	entry_point	string	Named entry point to use, if it does not exist in the metadata of the package it executes the function from the package directly using $packageName.$entryPoint()
named_parameters	named_parameters	object	Command-line parameters passed to Python wheel task in the form of ["--name=task", "--data=dbfs:/path/to/data.json"]. Leave it empty if parameters is not null.
package_name	package_name	string	Name of the package to execute
parameters	parameters	array of string	Command-line parameters passed to Python wheel task. Leave it empty if named_parameters is not null.

JobsRunIf

: string

JobsRunJobTask

Name	Path	Type	Description
job_id	job_id	integer	ID of the job to trigger.
job_parameters	job_parameters	object	Job-level parameters used to trigger the job.
pipeline_params	pipeline_params	JobsPipelineParams

JobsSparkJarTask

Name	Path	Type	Description
main_class_name	main_class_name	string	The full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library. The code must use SparkContext.getOrCreate to obtain a Spark context; otherwise, runs of the job fail.
parameters	parameters	array of string	Parameters passed to the main method. Use Task parameter variables to set parameters containing information about job runs.

JobsSparkPythonTask

Name	Path	Type	Description
parameters	parameters	array of string	Command line parameters passed to the Python file. Use Task parameter variables to set parameters containing information about job runs.
python_file	python_file	string	The Python file to be executed. Cloud file URIs (such as dbfs:/, s3:/, adls:/, gcs:/) and workspace paths are supported. For python files stored in the Azure Databricks workspace, the path must be absolute and begin with /. For files stored in a remote repository, the path must be relative. This field is required.
source	source	JobsSource

JobsSparkSubmitTask

Name	Path	Type	Description
parameters	parameters	array of string	Command-line parameters passed to spark submit. Use Task parameter variables to set parameters containing information about job runs.

JobsWebhookNotifications

Name	Path	Type	Description
on_duration_warning_threshold_exceeded	on_duration_warning_threshold_exceeded	array of JobsWebhook	An optional list of system notification IDs to call when the duration of a run exceeds the threshold specified for the RUN_DURATION_SECONDS metric in the health field. A maximum of 3 destinations can be specified for the on_duration_warning_threshold_exceeded property.
on_failure	on_failure	array of JobsWebhook	An optional list of system notification IDs to call when the run fails. A maximum of 3 destinations can be specified for the on_failure property.
on_start	on_start	array of JobsWebhook	An optional list of system notification IDs to call when the run starts. A maximum of 3 destinations can be specified for the on_start property.
on_streaming_backlog_exceeded	on_streaming_backlog_exceeded	array of JobsWebhook	An optional list of system notification IDs to call when any streaming backlog thresholds are exceeded for any stream. Streaming backlog thresholds can be set in the health field using the following metrics: STREAMING_BACKLOG_BYTES, STREAMING_BACKLOG_RECORDS, STREAMING_BACKLOG_SECONDS, or STREAMING_BACKLOG_FILES. Alerting is based on the 10-minute average of these metrics. If the issue persists, notifications are resent every 30 minutes. A maximum of 3 destinations can be specified for the on_streaming_backlog_exceeded property.
on_success	on_success	array of JobsWebhook	An optional list of system notification IDs to call when the run completes successfully. A maximum of 3 destinations can be specified for the on_success property.

JobsWebhook

Name	Path	Type	Description
id	id	string

JobsTriggerSettings

Name	Path	Type
file_arrival	file_arrival	JobsFileArrivalTriggerConfiguration
pause_status	pause_status	JobsPauseStatus
periodic	periodic	JobsPeriodicTriggerConfiguration

JobsFileArrivalTriggerConfiguration

Name	Path	Type	Description
min_time_between_triggers_seconds	min_time_between_triggers_seconds	integer	If set, the trigger starts a run only after the specified amount of time passed since the last time the trigger fired. The minimum allowed value is 60 seconds
url	url	string	URL to be monitored for file arrivals. The path must point to the root or a subpath of the external location.
wait_after_last_change_seconds	wait_after_last_change_seconds	integer	If set, the trigger starts a run only after no file activity has occurred for the specified amount of time. This makes it possible to wait for a batch of incoming files to arrive before triggering a run. The minimum allowed value is 60 seconds.

JobsPeriodicTriggerConfiguration

Name	Path	Type	Description
interval	interval	integer	The interval at which the trigger should run.
unit	unit	JobsPeriodicTriggerConfigurationTimeUnit

JobsPeriodicTriggerConfigurationTimeUnit

: string

JobsTriggerStateProto

Name	Path	Type	Description
file_arrival	file_arrival	JobsFileArrivalTriggerState

JobsFileArrivalTriggerState

Name	Path	Type	Description
using_file_events	using_file_events	boolean	Indicates whether the trigger leverages file events to detect file arrivals.

JobsRun

Name	Path	Type	Description
attempt_number	attempt_number	integer	The sequence number of this run attempt for a triggered job run. The initial attempt of a run has an attempt_number of 0. If the initial run attempt fails, and the job has a retry policy (max_retries > 0), subsequent runs are created with an original_attempt_run_id of the original attempt’s ID and an incrementing attempt_number. Runs are retried only until they succeed, and the maximum attempt_number is the same as the max_retries value for the job.
cleanup_duration	cleanup_duration	integer	The time in milliseconds it took to terminate the cluster and clean up any associated artifacts. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The cleanup_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field.
cluster_instance	cluster_instance	JobsClusterInstance
cluster_spec	cluster_spec	JobsClusterSpec
creator_user_name	creator_user_name	string	The creator user name. This field won’t be included in the response if the user has already been deleted.
description	description	string	Description of the run
effective_performance_target	effective_performance_target	JobsPerformanceTarget
end_time	end_time	integer	The time at which this run ended in epoch milliseconds (milliseconds since 1/1/1970 UTC). This field is set to 0 if the job is still running.
execution_duration	execution_duration	integer	The time in milliseconds it took to execute the commands in the JAR or notebook until they completed, failed, timed out, were cancelled, or encountered an unexpected error. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The execution_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field.
git_source	git_source	JobsGitSource
has_more	has_more	boolean	Indicates if the run has more array properties (tasks, job_clusters) that are not shown. They can be accessed via :method:jobs/getrun endpoint. It is only relevant for API 2.2 :method:jobs/listruns requests with expand_tasks=true.
job_clusters	job_clusters	array of JobsJobCluster	A list of job cluster specifications that can be shared and reused by tasks of this job. Libraries cannot be declared in a shared job cluster. You must declare dependent libraries in task settings. If more than 100 job clusters are available, you can paginate through them using :method:jobs/getrun.
job_id	job_id	integer	The canonical identifier of the job that contains this run.
job_parameters	job_parameters	array of JobsJobParameter	Job-level parameters used in the run
job_run_id	job_run_id	integer	ID of the job run that this run belongs to. For legacy and single-task job runs the field is populated with the job run ID. For task runs, the field is populated with the ID of the job run that the task run belongs to.
next_page_token	next_page_token	string	A token that can be used to list the next page of array properties.
original_attempt_run_id	original_attempt_run_id	integer	If this run is a retry of a prior run attempt, this field contains the run_id of the original attempt; otherwise, it is the same as the run_id.
overriding_parameters	overriding_parameters	JobsRunParameters
queue_duration	queue_duration	integer	The time in milliseconds that the run has spent in the queue.
repair_history	repair_history	array of JobsRepairHistoryItem	The repair history of the run.
run_duration	run_duration	integer	The time in milliseconds it took the job run and all of its repairs to finish.
run_id	run_id	integer	The canonical identifier of the run. This ID is unique across all runs of all jobs.
run_name	run_name	string	An optional name for the run. The maximum length is 4096 bytes in UTF-8 encoding.
run_page_url	run_page_url	string	The URL to the detail page of the run.
run_type	run_type	JobsRunType
schedule	schedule	JobsCronSchedule
setup_duration	setup_duration	integer	The time in milliseconds it took to set up the cluster. For runs that run on new clusters this is the cluster creation time, for runs that run on existing clusters this time should be very short. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The setup_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field.
start_time	start_time	integer	The time at which this run was started in epoch milliseconds (milliseconds since 1/1/1970 UTC). This may not be the time when the job task starts executing, for example, if the job is scheduled to run on a new cluster, this is the time the cluster creation call is issued.
status	status	JobsRunStatus
tasks	tasks	array of JobsRunTask	The list of tasks performed by the run. Each task has its own run_id which you can use to call JobsGetOutput to retrieve the run resutls. If more than 100 tasks are available, you can paginate through them using :method:jobs/getrun. Use the next_page_token field at the object root to determine if more results are available.
trigger	trigger	JobsTriggerType
trigger_info	trigger_info	JobsTriggerInfo

JobsClusterInstance

Name	Path	Type	Description
cluster_id	cluster_id	string	The canonical identifier for the cluster used by a run. This field is always available for runs on existing clusters. For runs on new clusters, it becomes available once the cluster is created. This value can be used to view logs by browsing to /#setting/sparkui/$cluster_id/driver-logs. The logs continue to be available after the run completes. The response won’t include this field if the identifier is not available yet.
spark_context_id	spark_context_id	string	The canonical identifier for the Spark context used by a run. This field is filled in once the run begins execution. This value can be used to view the Spark UI by browsing to /#setting/sparkui/$cluster_id/$spark_context_id. The Spark UI continues to be available after the run has completed. The response won’t include this field if the identifier is not available yet.

JobsClusterSpec

Name	Path	Type	Description
existing_cluster_id	existing_cluster_id	string	If existing_cluster_id, the ID of an existing cluster that is used for all runs. When running jobs or tasks on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs and tasks on new clusters for greater reliability
job_cluster_key	job_cluster_key	string	If job_cluster_key, this task is executed reusing the cluster specified in job.settings.job_clusters.
libraries	libraries	array of ComputeLibrary	An optional list of libraries to be installed on the cluster. The default value is an empty list.
new_cluster	new_cluster	ComputeClusterSpec

JobsJobParameter

Name	Path	Type	Description
default	default	string	The optional default value of the parameter
name	name	string	The name of the parameter
value	value	string	The value used in the run

JobsRunParameters

Name	Path	Type	Description
pipeline_params	pipeline_params	JobsPipelineParams

JobsRepairHistoryItem

Name	Path	Type	Description
effective_performance_target	effective_performance_target	JobsPerformanceTarget
end_time	end_time	integer	The end time of the (repaired) run.
id	id	integer	The ID of the repair. Only returned for the items that represent a repair in repair_history.
start_time	start_time	integer	The start time of the (repaired) run.
status	status	JobsRunStatus
task_run_ids	task_run_ids	array of integer	The run IDs of the task runs that ran as part of this repair history item.
type	type	JobsRepairHistoryItemType

JobsRunStatus

Name	Path	Type
queue_details	queue_details	JobsQueueDetails
state	state	JobsRunLifecycleStateV2State
termination_details	termination_details	JobsTerminationDetails

JobsQueueDetails

Name	Path	Type	Description
code	code	JobsQueueDetailsCodeCode
message	message	string	A descriptive message with the queuing details. This field is unstructured, and its exact format is subject to change.

JobsQueueDetailsCodeCode

: string

JobsRunLifecycleStateV2State

: string

JobsTerminationDetails

Name	Path	Type	Description
code	code	JobsTerminationCodeCode
message	message	string	A descriptive message with the termination details. This field is unstructured and the format might change.
type	type	JobsTerminationTypeType

JobsRunTask

Name	Path	Type	Description
attempt_number	attempt_number	integer	The sequence number of this run attempt for a triggered job run. The initial attempt of a run has an attempt_number of 0. If the initial run attempt fails, and the job has a retry policy (max_retries > 0), subsequent runs are created with an original_attempt_run_id of the original attempt’s ID and an incrementing attempt_number. Runs are retried only until they succeed, and the maximum attempt_number is the same as the max_retries value for the job.
clean_rooms_notebook_task	clean_rooms_notebook_task	Object
cleanup_duration	cleanup_duration	integer	The time in milliseconds it took to terminate the cluster and clean up any associated artifacts. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The cleanup_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field.
cluster_instance	cluster_instance	JobsClusterInstance
condition_task	condition_task	JobsRunConditionTask
dashboard_task	dashboard_task	Object
dbt_task	dbt_task	Object
depends_on	depends_on	array of JobsTaskDependency	An optional array of objects specifying the dependency graph of the task. All tasks specified in this field must complete successfully before executing this task. The key is task_key, and the value is the name assigned to the dependent task.
description	description	string	An optional description for this task.
effective_performance_target	effective_performance_target	JobsPerformanceTarget
email_notifications	email_notifications	JobsJobEmailNotifications
end_time	end_time	integer	The time at which this run ended in epoch milliseconds (milliseconds since 1/1/1970 UTC). This field is set to 0 if the job is still running.
environment_key	environment_key	string	The key that references an environment spec in a job. This field is required for Python script, Python wheel and dbt tasks when using serverless compute.
execution_duration	execution_duration	integer	The time in milliseconds it took to execute the commands in the JAR or notebook until they completed, failed, timed out, were cancelled, or encountered an unexpected error. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The execution_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field.
existing_cluster_id	existing_cluster_id	string	If existing_cluster_id, the ID of an existing cluster that is used for all runs. When running jobs or tasks on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs and tasks on new clusters for greater reliability
for_each_task	for_each_task	Object
git_source	git_source	JobsGitSource
job_cluster_key	job_cluster_key	string	If job_cluster_key, this task is executed reusing the cluster specified in job.settings.job_clusters.
libraries	libraries	array of Object	An optional list of libraries to be installed on the cluster. The default value is an empty list.
new_cluster	new_cluster	Object
notebook_task	notebook_task	JobsNotebookTask
notification_settings	notification_settings	Object
pipeline_task	pipeline_task	Object
power_bi_task	power_bi_task	Object
python_wheel_task	python_wheel_task	Object
queue_duration	queue_duration	integer	The time in milliseconds that the run has spent in the queue.
resolved_values	resolved_values	JobsResolvedValues
run_duration	run_duration	integer	The time in milliseconds it took the job run and all of its repairs to finish.
run_id	run_id	integer	The ID of the task run.
run_if	run_if	JobsRunIf
run_job_task	run_job_task	JobsRunJobTask
run_page_url	run_page_url	string
setup_duration	setup_duration	integer	The time in milliseconds it took to set up the cluster. For runs that run on new clusters this is the cluster creation time, for runs that run on existing clusters this time should be very short. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The setup_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field.
spark_jar_task	spark_jar_task	Object
spark_python_task	spark_python_task	Object
spark_submit_task	spark_submit_task	Object
sql_task	sql_task	Object
start_time	start_time	integer	The time at which this run was started in epoch milliseconds (milliseconds since 1/1/1970 UTC). This may not be the time when the job task starts executing, for example, if the job is scheduled to run on a new cluster, this is the time the cluster creation call is issued.
status	status	JobsRunStatus
task_key	task_key	string	A unique name for the task. This field is used to refer to this task from other tasks. This field is required and must be unique within its parent job. On Update or Reset, this field is used to reference the tasks to be updated or reset.
timeout_seconds	timeout_seconds	integer	An optional timeout applied to each run of this job task. A value of 0 means no timeout.
webhook_notifications	webhook_notifications	Object

JobsRunConditionTask

Name	Path	Type	Description
left	left	string	The left operand of the condition task. Can be either a string value or a job state or parameter reference.
op	op	JobsConditionTaskOp
outcome	outcome	string	The condition expression evaluation result. Filled in if the task was successfully completed. Can be "true" or "false"
right	right	string	The right operand of the condition task. Can be either a string value or a job state or parameter reference.

JobsTriggerType

: string

JobsTriggerInfo

Name	Path	Type	Description
run_id	run_id	integer	The run id of the Run Job task run

JobsRunOutput

Name	Path	Type	Description
clean_rooms_notebook_output	clean_rooms_notebook_output	Object
dashboard_output	dashboard_output	Object
dbt_output	dbt_output	Object
error	error	string	An error message indicating why a task failed or why output is not available. The message is unstructured, and its exact format is subject to change.
error_trace	error_trace	string	If there was an error executing the run, this field contains any available stack traces.
info	info	string
logs	logs	string	The output from tasks that write to standard streams (stdout/stderr) such as spark_jar_task, spark_python_task, python_wheel_task. It's not supported for the notebook_task, pipeline_task or spark_submit_task. Azure Databricks restricts this API to return the last 5 MB of these logs.
logs_truncated	logs_truncated	boolean	Whether the logs are truncated.
metadata	metadata	Object
notebook_output	notebook_output	JobsNotebookOutput
run_job_output	run_job_output	JobsRunJobOutput
sql_output	sql_output	Object

JobsNotebookOutput

Name	Path	Type	Description
result	result	string	The value passed to dbutils.notebook.exit(). Azure Databricks restricts this API to return the first 5 MB of the value. For a larger result, your job can store the results in a cloud storage service. This field is absent if dbutils.notebook.exit() was never called.
truncated	truncated	boolean	Whether or not the result was truncated.

JobsRunJobOutput

Name	Path	Type	Description
run_id	run_id	integer	The run id of the triggered job run

JobsResolvedValues

Name	Path	Type
condition_task	condition_task	JobsResolvedConditionTaskValues
dbt_task	dbt_task	JobsResolvedDbtTaskValues
notebook_task	notebook_task	JobsResolvedNotebookTaskValues
python_wheel_task	python_wheel_task	JobsResolvedPythonWheelTaskValues
run_job_task	run_job_task	JobsResolvedRunJobTaskValues
simulation_task	simulation_task	JobsResolvedParamPairValues
spark_jar_task	spark_jar_task	JobsResolvedStringParamsValues
spark_python_task	spark_python_task	JobsResolvedStringParamsValues
spark_submit_task	spark_submit_task	JobsResolvedStringParamsValues
sql_task	sql_task	JobsResolvedParamPairValues

JobsResolvedConditionTaskValues

Name	Path	Type	Description
left	left	string
right	right	string

JobsResolvedDbtTaskValues

Name	Path	Type	Description
commands	commands	array of string

JobsResolvedNotebookTaskValues

Name	Path	Type	Description
base_parameters	base_parameters	object

JobsResolvedPythonWheelTaskValues

Name	Path	Type	Description
named_parameters	named_parameters	object
parameters	parameters	array of string

JobsResolvedRunJobTaskValues

Name	Path	Type	Description
job_parameters	job_parameters	object
parameters	parameters	object

JobsResolvedParamPairValues

Name	Path	Type	Description
parameters	parameters	object

JobsResolvedStringParamsValues

Name	Path	Type	Description
parameters	parameters	array of string