DatabricksCluster Class

Defines Databricks cluster information for use in a DatabricksSection.

Initialize.

Inheritance
azureml._base_sdk_common.abstract_run_config_element._AbstractRunConfigElement
DatabricksCluster

Constructor

DatabricksCluster(existing_cluster_id=None, spark_version=None, node_type=None, instance_pool_id=None, num_workers=None, min_workers=None, max_workers=None, spark_env_variables=None, spark_conf=None, init_scripts=None, cluster_log_dbfs_path=None, permit_cluster_restart=None)

Parameters

existing_cluster_id
str
default value: None

A cluster ID of an existing interactive cluster on the Databricks workspace. If this parameter is specified, none of the other parameters should be specified.

spark_version
str
default value: None

The version of Spark for the Databricks run cluster. Example: "10.4.x-scala2.12".

node_type
str
default value: None

The Azure VM node types for the Databricks run cluster. Example: "Standard_D3_v2".

instance_pool_id
str
default value: None

The instance pool ID to which the cluster needs to be attached to.

num_workers
int
default value: None

The number of workers for a Databricks run cluster. If this parameter is specified, the min_workers and max_workers parameters should not be specified.

min_workers
int
default value: None

The minimum number of workers for an autoscaled Databricks cluster.

max_workers
int
default value: None

The number of workers for an autoscaled Databricks run cluster.

spark_env_variables
dict(<xref:{str:str}>)
default value: None

The Spark environment variables for the Databricks run cluster.

spark_conf
dict(<xref:{str:str}>)
default value: None

The Spark configuration for the Databricks run cluster.

init_scripts
list[str]
default value: None

Deprecated. Databricks announced the init script stored in DBFS will stop work after Dec 1, 2023. To mitigate the issue, please 1) use global init scripts in databricks following https://learn.microsoft.com/azure/databricks/init-scripts/global 2) comment out the line of init_scripts in your AzureML databricks step.

cluster_log_dbfs_path
str
default value: None

The DBFS path to where clusters logs need to be delivered.

permit_cluster_restart
bool
default value: None

if existing_cluster_id is specified, this parameter tells whether cluster can be restarted on behalf of user.

Methods

validate

Validate the specified Databricks cluster details.

Validate checks the types of provided parameters as well as whether the correct combination of parameters is provided. For example, you need to either specify the existing_cluster_id or specify the rest of the cluster parameters. For more information see the constructor parameter definitions.

validate

Validate the specified Databricks cluster details.

Validate checks the types of provided parameters as well as whether the correct combination of parameters is provided. For example, you need to either specify the existing_cluster_id or specify the rest of the cluster parameters. For more information see the constructor parameter definitions.

validate()

Exceptions

class:azureml.exceptions.UserErrorException