Parallel Class
Base class for parallel node, used for parallel component version consumption.
You should not instantiate this class directly. Instead, you should create from builder function: parallel.
- Inheritance
-
azure.ai.ml.entities._builders.base_node.BaseNodeParallelazure.ai.ml.entities._job.pipeline._io.mixin.NodeWithGroupInputMixinParallel
Constructor
Parallel(*, component: ParallelComponent | str, compute: str | None = None, inputs: Dict[str, NodeOutput | Input | str | bool | int | float | Enum] | None = None, outputs: Dict[str, str | Output] | None = None, retry_settings: RetrySettings | Dict[str, str] | None = None, logging_level: str | None = None, max_concurrency_per_instance: int | None = None, error_threshold: int | None = None, mini_batch_error_threshold: int | None = None, input_data: str | None = None, task: ParallelTask | RunFunction | Dict | None = None, partition_keys: List | None = None, mini_batch_size: int | str | None = None, resources: JobResourceConfiguration | None = None, environment_variables: Dict | None = None, identity: Dict | ManagedIdentityConfiguration | AmlTokenConfiguration | UserIdentityConfiguration | None = None, **kwargs: Any)
Parameters
Name | Description |
---|---|
component
Required
|
<xref:azure.ai.ml.entities._component.parallel_component.parallelComponent>
Id or instance of the parallel component/job to be run for the step |
name
Required
|
Name of the parallel |
description
Required
|
Description of the commad |
tags
Required
|
Tag dictionary. Tags can be added, removed, and updated |
properties
Required
|
The job property dictionary |
display_name
Required
|
Display name of the job |
retry_settings
Required
|
Parallel job run failed retry |
logging_level
Required
|
A string of the logging level name |
max_concurrency_per_instance
Required
|
The max parallellism that each compute instance has |
error_threshold
Required
|
The number of item processing failures should be ignored |
mini_batch_error_threshold
Required
|
The number of mini batch processing failures should be ignored |
task
Required
|
The parallel task |
mini_batch_size
Required
|
For FileDataset input, this field is the number of files a user script can process in one run() call. For TabularDataset input, this field is the approximate size of data the user script can process in one run() call. Example values are 1024, 1024KB, 10MB, and 1GB. (optional, default value is 10 files for FileDataset and 1MB for TabularDataset.) This value could be set through PipelineParameter |
partition_keys
Required
|
The keys used to partition dataset into mini-batches. If specified, the data with the same key will be partitioned into the same mini-batch. If both partition_keys and mini_batch_size are specified, the partition keys will take effect. The input(s) must be partitioned dataset(s), and the partition_keys must be a subset of the keys of every input dataset for this to work. |
input_data
Required
|
The input data |
inputs
Required
|
Inputs of the component/job |
outputs
Required
|
Outputs of the component/job |
Keyword-Only Parameters
Name | Description |
---|---|
identity
|
Optional[Union[ dict[str, str], ManagedIdentityConfiguration, AmlTokenConfiguration, UserIdentityConfiguration]]
The identity that the command job will use while running on compute. |
component
Required
|
|
compute
Required
|
|
inputs
Required
|
|
outputs
Required
|
|
retry_settings
Required
|
|
logging_level
Required
|
|
max_concurrency_per_instance
Required
|
|
error_threshold
Required
|
|
mini_batch_error_threshold
Required
|
|
input_data
Required
|
|
task
Required
|
|
partition_keys
Required
|
|
mini_batch_size
Required
|
|
resources
Required
|
|
environment_variables
Required
|
|
Methods
clear | |
copy | |
dump |
Dumps the job content into a file in YAML format. |
fromkeys |
Create a new dictionary with keys from iterable and values set to value. |
get |
Return the value for key if key is in the dictionary, else default. |
items | |
keys | |
pop |
If the key is not found, return the default if given; otherwise, raise a KeyError. |
popitem |
Remove and return a (key, value) pair as a 2-tuple. Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty. |
set_resources |
Set the resources for the parallel job. |
setdefault |
Insert key with a value of default if key is not in the dictionary. Return the value for key if key is in the dictionary, else default. |
update |
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k] |
values |
clear
clear() -> None. Remove all items from D.
copy
copy() -> a shallow copy of D
dump
Dumps the job content into a file in YAML format.
dump(dest: str | PathLike | IO, **kwargs: Any) -> None
Parameters
Name | Description |
---|---|
dest
Required
|
The local path or file stream to write the YAML content to. If dest is a file path, a new file will be created. If dest is an open file, the file will be written to directly. |
Exceptions
Type | Description |
---|---|
Raised if dest is a file path and the file already exists. |
|
Raised if dest is an open file and the file is not writable. |
fromkeys
Create a new dictionary with keys from iterable and values set to value.
fromkeys(value=None, /)
Positional-Only Parameters
Name | Description |
---|---|
iterable
Required
|
|
value
|
Default value: None
|
Parameters
Name | Description |
---|---|
type
Required
|
|
get
Return the value for key if key is in the dictionary, else default.
get(key, default=None, /)
Positional-Only Parameters
Name | Description |
---|---|
key
Required
|
|
default
|
Default value: None
|
items
items() -> a set-like object providing a view on D's items
keys
keys() -> a set-like object providing a view on D's keys
pop
If the key is not found, return the default if given; otherwise, raise a KeyError.
pop(k, [d]) -> v, remove specified key and return the corresponding value.
popitem
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
popitem()
set_resources
Set the resources for the parallel job.
set_resources(*, instance_type: str | List[str] | None = None, instance_count: int | None = None, properties: Dict | None = None, docker_args: str | None = None, shm_size: str | None = None, **kwargs: Any) -> None
Keyword-Only Parameters
Name | Description |
---|---|
instance_type
|
The instance type or a list of instance types used as supported by the compute target. |
instance_count
|
The number of instances or nodes used by the compute target. |
properties
|
The property dictionary for the resources. |
docker_args
|
Extra arguments to pass to the Docker run command. |
shm_size
|
Size of the Docker container's shared memory block. |
setdefault
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
setdefault(key, default=None, /)
Positional-Only Parameters
Name | Description |
---|---|
key
Required
|
|
default
|
Default value: None
|
update
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
update([E], **F) -> None. Update D from dict/iterable E and F.
values
values() -> an object providing a view on D's values
Attributes
base_path
component
Get the component of the parallel job.
Returns
Type | Description |
---|---|
str,
|
The component of the parallel job. |
creation_context
The creation context of the resource.
Returns
Type | Description |
---|---|
The creation metadata for the resource. |
id
The resource ID.
Returns
Type | Description |
---|---|
The global ID of the resource, an Azure Resource Manager (ARM) ID. |
identity
The identity that the job will use while running on compute.
Returns
Type | Description |
---|---|
The identity that the job will use while running on compute. |
inputs
Get the inputs for the object.
Returns
Type | Description |
---|---|
A dictionary containing the inputs for the object. |
log_files
Job output files.
Returns
Type | Description |
---|---|
The dictionary of log names and URLs. |
name
outputs
Get the outputs of the object.
Returns
Type | Description |
---|---|
A dictionary containing the outputs for the object. |
resources
Get the resource configuration for the parallel job.
Returns
Type | Description |
---|---|
The resource configuration for the parallel job. |
retry_settings
Get the retry settings for the parallel job.
Returns
Type | Description |
---|---|
The retry settings for the parallel job. |
status
The status of the job.
Common values returned include "Running", "Completed", and "Failed". All possible values are:
NotStarted - This is a temporary state that client-side Run objects are in before cloud submission.
Starting - The Run has started being processed in the cloud. The caller has a run ID at this point.
Provisioning - On-demand compute is being created for a given job submission.
Preparing - The run environment is being prepared and is in one of two stages:
Docker image build
conda environment setup
Queued - The job is queued on the compute target. For example, in BatchAI, the job is in a queued state
while waiting for all the requested nodes to be ready.
Running - The job has started to run on the compute target.
Finalizing - User code execution has completed, and the run is in post-processing stages.
CancelRequested - Cancellation has been requested for the job.
Completed - The run has completed successfully. This includes both the user code execution and run
post-processing stages.
Failed - The run failed. Usually the Error property on a run will provide details as to why.
Canceled - Follows a cancellation request and indicates that the run is now successfully cancelled.
NotResponding - For runs that have Heartbeats enabled, no heartbeat has been recently sent.
Returns
Type | Description |
---|---|
Status of the job. |
studio_url
task
type
Azure SDK for Python