Spark Class

Reference

Base class for spark node, used for spark component version consumption.

You should not instantiate this class directly. Instead, you should create it from the builder function: spark.

]

]]

Inheritance: azure.ai.ml.entities._builders.base_node.BaseNode

Spark

azure.ai.ml.entities._job.spark_job_entry_mixin.SparkJobEntryMixin

Spark

Constructor

Spark(*, component: str | SparkComponent, identity: Dict | ManagedIdentityConfiguration | AmlTokenConfiguration | UserIdentityConfiguration | None = None, driver_cores: int | str | None = None, driver_memory: str | None = None, executor_cores: int | str | None = None, executor_memory: str | None = None, executor_instances: int | str | None = None, dynamic_allocation_enabled: bool | str | None = None, dynamic_allocation_min_executors: int | str | None = None, dynamic_allocation_max_executors: int | str | None = None, conf: Dict[str, str] | None = None, inputs: Dict[str, NodeOutput | Input | str | bool | int | float | Enum] | None = None, outputs: Dict[str, str | Output] | None = None, compute: str | None = None, resources: Dict | SparkResourceConfiguration | None = None, entry: Dict[str, str] | SparkJobEntry | None = None, py_files: List[str] | None = None, jars: List[str] | None = None, files: List[str] | None = None, archives: List[str] | None = None, args: str | None = None, **kwargs: Any)

Parameters

Name	Description
component Required	Union[str, SparkComponent] The ID or instance of the Spark component or job to be run during the step.
identity Required	Union[Dict[str, str], ManagedIdentityConfiguration, AmlTokenConfiguration, UserIdentityConfiguration The identity that the Spark job will use while running on compute.
driver_cores Required	int The number of cores to use for the driver process, only in cluster mode.
driver_memory Required	str The amount of memory to use for the driver process, formatted as strings with a size unit suffix ("k", "m", "g" or "t") (e.g. "512m", "2g").
executor_cores Required	int The number of cores to use on each executor.
executor_memory Required	str The amount of memory to use per executor process, formatted as strings with a size unit suffix ("k", "m", "g" or "t") (e.g. "512m", "2g").
executor_instances Required	int The initial number of executors.
dynamic_allocation_enabled Required	bool Whether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload.
dynamic_allocation_min_executors Required	int The lower bound for the number of executors if dynamic allocation is enabled.
dynamic_allocation_max_executors Required	int The upper bound for the number of executors if dynamic allocation is enabled.
conf Required	Dict[str, str] A dictionary with pre-defined Spark configurations key and values.
inputs Required	Dict[str, Union[ str, bool, int, float, <xref:Enum>, <xref:azure.ai.ml.entities._job.pipeline._io.NodeOutput>, Input A mapping of input names to input data sources used in the job.
outputs Required	Dict[str, Union[str, Output]] A mapping of output names to output data sources used in the job.
args Required	str The arguments for the job.
compute Required	str The compute resource the job runs on.
resources Required	Union[Dict, SparkResourceConfiguration] The compute resource configuration for the job.
entry Required	Dict[str, str] The file or class entry point.
py_files Required	List[str] The list of .zip, .egg or .py files to place on the PYTHONPATH for Python apps.
jars Required	List[str] The list of .JAR files to include on the driver and executor classpaths.
files Required	List[str] The list of files to be placed in the working directory of each executor.
archives Required	List[str] The list of archives to be extracted into the working directory of each executor.

Keyword-Only Parameters

Name	Description
component Required
identity Required
driver_cores Required
driver_memory Required
executor_cores Required
executor_memory Required
executor_instances Required
dynamic_allocation_enabled Required
dynamic_allocation_min_executors Required
dynamic_allocation_max_executors Required
conf Required
inputs Required
outputs Required
compute Required
resources Required
entry Required
py_files Required
jars Required
files Required
archives Required
args Required

Methods

clear
copy
dump	Dumps the job content into a file in YAML format.
fromkeys	Create a new dictionary with keys from iterable and values set to value.
get	Return the value for key if key is in the dictionary, else default.
items
keys
pop	If the key is not found, return the default if given; otherwise, raise a KeyError.
popitem	Remove and return a (key, value) pair as a 2-tuple. Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
setdefault	Insert key with a value of default if key is not in the dictionary. Return the value for key if key is in the dictionary, else default.
update	If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
values

clear

clear() -> None.  Remove all items from D.

copy

copy() -> a shallow copy of D

dump

Dumps the job content into a file in YAML format.

dump(dest: str | PathLike | IO, **kwargs: Any) -> None

Parameters

Name	Description
dest Required	Union[<xref:PathLike>, str, IO[AnyStr]] The local path or file stream to write the YAML content to. If dest is a file path, a new file will be created. If dest is an open file, the file will be written to directly.

Keyword-Only Parameters

Name	Description
kwargs	dict Additional arguments to pass to the YAML serializer.

Exceptions

Type	Description
FileExistsError	Raised if dest is a file path and the file already exists.
IOError	Raised if dest is an open file and the file is not writable.

fromkeys

Create a new dictionary with keys from iterable and values set to value.

fromkeys(value=None, /)

Positional-Only Parameters

Name	Description
iterable Required
value	default value: None

Parameters

Name	Description
type Required

get

Return the value for key if key is in the dictionary, else default.

get(key, default=None, /)

Positional-Only Parameters

Name	Description
key Required
default	default value: None

items

items() -> a set-like object providing a view on D's items

keys

keys() -> a set-like object providing a view on D's keys

pop

If the key is not found, return the default if given; otherwise, raise a KeyError.

pop(k, [d]) -> v, remove specified key and return the corresponding value.

popitem

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

popitem()

setdefault

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

setdefault(key, default=None, /)

Positional-Only Parameters

Name	Description
key Required
default	default value: None

update

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

update([E], **F) -> None.  Update D from dict/iterable E and F.

values

values() -> an object providing a view on D's values

Attributes

base_path

The base path of the resource.

Returns

Type	Description
str	The base path of the resource.

code

The local or remote path pointing at source code.

Returns

Type	Description
Union[str, <xref:PathLike>]

component

The ID or instance of the Spark component or job to be run during the step.

Returns

Type	Description
SparkComponent

creation_context

The creation context of the resource.

Returns

Type	Description
Optional[SystemData]	The creation metadata for the resource.

entry

id

The resource ID.

Returns

Type	Description
Optional[str]	The global ID of the resource, an Azure Resource Manager (ARM) ID.

identity

The identity that the Spark job will use while running on compute.

Returns

Type	Description
Union[ManagedIdentityConfiguration, AmlTokenConfiguration, UserIdentityConfiguration]

inputs

Get the inputs for the object.

Returns

Type	Description
Dict[str, Union[Input, str, bool, int, float]]	A dictionary containing the inputs for the object.

log_files

Job output files.

Returns

Type	Description
Optional[Dict[str, str]]	The dictionary of log names and URLs.

name

Get the name of the node.

Returns

Type	Description
str	The name of the node.

outputs

Get the outputs of the object.

Returns

Type	Description
Dict[str, Union[str, Output]]	A dictionary containing the outputs for the object.

resources

The compute resource configuration for the job.

Returns

Type	Description
SparkResourceConfiguration

status

The status of the job.

Common values returned include "Running", "Completed", and "Failed". All possible values are:

NotStarted - This is a temporary state that client-side Run objects are in before cloud submission.
Starting - The Run has started being processed in the cloud. The caller has a run ID at this point.
Provisioning - On-demand compute is being created for a given job submission.
Preparing - The run environment is being prepared and is in one of two stages:
- Docker image build
- conda environment setup
Queued - The job is queued on the compute target. For example, in BatchAI, the job is in a queued state

while waiting for all the requested nodes to be ready.
Running - The job has started to run on the compute target.
Finalizing - User code execution has completed, and the run is in post-processing stages.
CancelRequested - Cancellation has been requested for the job.
Completed - The run has completed successfully. This includes both the user code execution and run

post-processing stages.
Failed - The run failed. Usually the Error property on a run will provide details as to why.
Canceled - Follows a cancellation request and indicates that the run is now successfully cancelled.
NotResponding - For runs that have Heartbeats enabled, no heartbeat has been recently sent.

Returns

Type	Description
Optional[str]	Status of the job.

studio_url

Azure ML studio endpoint.

Returns

Type	Description
Optional[str]	The URL to the job details page.

type

The type of the job.

Returns

Type	Description
Optional[str]	The type of the job.

CODE_ID_RE_PATTERN

CODE_ID_RE_PATTERN = re.compile('\\/subscriptions\\/(?P<subscription>[\\w,-]+)\\/resourceGroups\\/(?P<resource_group>[\\w,-]+)\\/providers\\/Microsoft\\.MachineLearningServices\\/workspaces\\/(?P<workspace>[\\w,-]+)\\/codes\\/(?P<co)

Spark Class

Constructor

Parameters

Keyword-Only Parameters

Methods

clear

copy

dump

Parameters

Keyword-Only Parameters

Exceptions

fromkeys

Positional-Only Parameters

Parameters

get

Positional-Only Parameters

items

keys

pop

popitem

setdefault

Positional-Only Parameters

update

values

Attributes

base_path

Returns

code

Returns

component

Returns

creation_context

Returns

entry

id

Returns

identity

Returns

inputs

Returns

log_files

Returns

name

Returns

outputs

Returns

resources

Returns

status

Returns

studio_url

Returns

type

Returns

CODE_ID_RE_PATTERN

Feedback

Additional resources