AzureBatchStep Class
Creates an Azure ML Pipeline step for submitting jobs to Azure Batch.
Note: This step does not support upload/download of directories and their contents.
For an example of using AzureBatchStep, see the notebook https://aka.ms/pl-azbatch.
Create an Azure ML Pipeline step for submitting jobs to Azure Batch.
- Inheritance
-
azureml.pipeline.core._azurebatch_step_base._AzureBatchStepBaseAzureBatchStep
Constructor
AzureBatchStep(name, create_pool=False, pool_id=None, delete_batch_job_after_finish=True, delete_batch_pool_after_finish=False, is_positive_exit_code_failure=True, vm_image_urn='urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter', run_task_as_admin=False, target_compute_nodes=1, vm_size='standard_d1_v2', source_directory=None, executable=None, arguments=None, inputs=None, outputs=None, allow_reuse=True, compute_target=None, version=None)
Parameters
- pool_id
- str
[Required] The ID of the pool where the job runs. The ID can be an existing pool, or one that will be created when the job is submitted.
- delete_batch_job_after_finish
- bool
Indicates whether to delete the job from Batch account after it's finished.
- delete_batch_pool_after_finish
- bool
Indicates whether to delete the pool after the job finishes.
- is_positive_exit_code_failure
- bool
Indicates whether the job fails if the task exists with a positive code.
- vm_image_urn
- str
If create_pool
is True and VM uses VirtualMachineConfiguration.
Value format: urn:publisher:offer:sku
.
Example: urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter
.
- run_task_as_admin
- bool
Indicates whether the task should run with admin privileges.
- target_compute_nodes
- int
If create_pool
is True, indicates how many compute nodes will be added
to the pool.
- vm_size
- str
If create_pool
is True, indicates the virtual machine size of the compute nodes.
- source_directory
- str
A local folder that contains the module binaries, executable, assemblies, etc.
- executable
- str
[Required] The name of the command/executable that will be executed as part of the job.
- inputs
- list[Union[InputPortBinding, DataReference, PortDataReference, PipelineData]]
A list of input port bindings. Before the job runs, a folder is created for each input. The files for each input will be copied from the storage to the respective folder on the compute node. For example, if the input name is input1, and the relative path on storage is some/relative/path/that/can/be/really/long/inputfile.txt, then the file path on the compute will be: ./input1/inputfile.txt. When the input name is longer than 32 characters, it will be truncated and appended with a unique suffix so the folder name can be created successfully on the compute target.
A list of output port bindings. Similar to inputs, before the job runs, a folder is created for each output. The folder name will be the same as the output name. The assumption is that the job will put the output into that folder.
- allow_reuse
- bool
Indicates whether the step should reuse previous results when re-run with the same settings. Reuse is enabled by default. If the step contents (scripts/dependencies) as well as inputs and parameters remain unchanged, the output from the previous run of this step is reused. When reusing the step, instead of submitting the job to compute, the results from the previous run are immediately made available to any subsequent steps. If you use Azure Machine Learning datasets as inputs, reuse is determined by whether the dataset's definition has changed, not by whether the underlying data has changed.
- compute_target
- BatchCompute, str
[Required] A BatchCompute compute where the job runs.
- version
- str
An optional version tag to denote a change in functionality for the module.
- pool_id
- str
[Required] The ID of the pool where the job runs. The ID can be an existing pool, or one that will be created when the job is submitted.
- delete_batch_job_after_finish
- bool
Indicates whether to delete the job from Batch account after it's finished.
- delete_batch_pool_after_finish
- bool
Indicates whether to delete the pool after the job finishes.
- is_positive_exit_code_failure
- bool
Indicates whether the job fails if the task exists with a positive code.
- vm_image_urn
- str
If create_pool
is True and VM uses VirtualMachineConfiguration.
Value format: urn:publisher:offer:sku
.
Example: urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter
.
- target_compute_nodes
- int
If create_pool
is True, indicates how many compute nodes will be added
to the pool.
- vm_size
- str
If create_pool
is True, indicates the Virtual machine size of the compute nodes.
- source_directory
- str
A local folder that contains the module binaries, executable, assemblies etc.
- executable
- str
[Required] The name of the command/executable that will be executed as part of the job.
- inputs
- list[Union[InputPortBinding, DataReference, PortDataReference, PipelineData]]
A list of input port bindings. Before the job runs, a folder is created for each input. The files for each input will be copied from the storage to the respective folder on the compute node. For example, if the input name is input1, and the relative path on storage is some/relative/path/that/can/be/really/long/inputfile.txt, then the file path on the compute will be: ./input1/inputfile.txt. In case the input name is longer than 32 characters, it will be truncated and appended with a unique suffix, so the folder name could be created successfully on the compute.
A list of output port bindings. Similar to inputs, before the job runs, a folder is created for each output. The folder name will be the same as the output name. The assumption is that the job will have the output into that folder.
- allow_reuse
- bool
Indicates whether the step should reuse previous results when re-run with the same settings. Reuse is enabled by default. If the step contents (scripts/dependencies) as well as inputs and parameters remain unchanged, the output from the previous run of this step is reused. When reusing the step, instead of submitting the job to compute, the results from the previous run are immediately made available to any subsequent steps. If you use Azure Machine Learning datasets as inputs, reuse is determined by whether the dataset's definition has changed, not by whether the underlying data has changed.
Remarks
The following example shows how to use AzureBatchStep in an Azure Machine Learning Pipeline.
step = AzureBatchStep(
name="Azure Batch Job",
pool_id="MyPoolName", # Replace this with the pool name of your choice
inputs=[testdata],
outputs=[outputdata],
executable="azurebatch.cmd",
arguments=[testdata, outputdata],
compute_target=batch_compute,
source_directory=binaries_folder,
)
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb
Methods
create_node |
Create a node from the AzureBatch step and add it to the specified graph. This method is not intended to be used directly. When a pipeline is instantiated with this step, Azure ML automatically passes the parameters required through this method so that step can be added to a pipeline graph that represents the workflow. |
create_node
Create a node from the AzureBatch step and add it to the specified graph.
This method is not intended to be used directly. When a pipeline is instantiated with this step, Azure ML automatically passes the parameters required through this method so that step can be added to a pipeline graph that represents the workflow.
create_node(graph, default_datastore, context)
Parameters
- default_datastore
- Union[AbstractAzureStorageDatastore, AzureDataLakeDatastore]
The default datastore.
- context
- <xref:azureml.pipeline.core._GraphContext>
The graph context.
Returns
The created node.
Return type
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for