CommandStep Class
Create an Azure ML Pipeline step that runs a command.
Create an Azure ML Pipeline step that runs a command.
- Inheritance
-
azureml.pipeline.core._python_script_step_base._PythonScriptStepBaseCommandStep
Constructor
CommandStep(command=None, name=None, compute_target=None, runconfig=None, runconfig_pipeline_params=None, inputs=None, outputs=None, params=None, source_directory=None, allow_reuse=True, version=None)
Parameters
The command to run or path of the executable/script relative to source_directory
.
It is required unless it is provided with runconfig. It can be specified with string arguments
in a single string or with input/output/PipelineParameter in a list.
- name
- str
The name of the step. If unspecified, the first word in the command
is used.
- compute_target
- DsvmCompute or AmlCompute or RemoteCompute or HDInsightCompute or str or tuple
The compute target to use. If unspecified, the target from
the runconfig
is used. This parameter may be specified as
a compute target object or the string name of a compute target on the workspace.
Optionally if the compute target is not available at pipeline creation time, you may specify a tuple of
('compute target name', 'compute target type') to avoid fetching the compute target object (AmlCompute
type is 'AmlCompute' and RemoteCompute type is 'VirtualMachine').
- runconfig
- ScriptRunConfig or RunConfiguration
The optional configuration object which encapsulates the information necessary to submit a training run in an experiment.
- runconfig_pipeline_params
- <xref:<xref:{str: PipelineParameter}>>
Overrides of runconfig properties at runtime using key-value pairs each with name of the runconfig property and PipelineParameter for that property.
Supported values: 'NodeCount', 'MpiProcessCountPerNode', 'TensorflowWorkerCount', 'TensorflowParameterServerCount'
- inputs
- list[InputPortBinding or DataReference or PortDataReference or PipelineData or <xref:azureml.pipeline.core.pipeline_output_dataset.PipelineOutputDataset> or DatasetConsumptionConfig]
A list of input port bindings.
- outputs
- list[PipelineData or OutputDatasetConfig or PipelineOutputAbstractDataset or OutputPortBinding]
A list of output port bindings.
- params
- dict
A dictionary of name-value pairs registered as environment variables with "AML_PARAMETER_".
- source_directory
- str
A folder that contains scripts, conda env, and other resources used in the step.
- allow_reuse
- bool
Indicates whether the step should reuse previous results when re-run with the same settings. Reuse is enabled by default. If the step contents (scripts/dependencies) as well as inputs and parameters remain unchanged, the output from the previous run of this step is reused. When reusing the step, instead of submitting the job to compute, the results from the previous run are immediately made available to any subsequent steps. If you use Azure Machine Learning datasets as inputs, reuse is determined by whether the dataset's definition has changed, not by whether the underlying data has changed.
- version
- str
An optional version tag to denote a change in functionality for the step.
The command to run or path of the executable/script relative to source_directory
.
It is required unless it is provided with runconfig. It can be specified with string arguments
in a single string or with input/output/PipelineParameter in a list.
- compute_target
- DsvmCompute or AmlCompute or RemoteCompute or HDInsightCompute or str or tuple
The compute target to use. If unspecified, the target from
the runconfig
is used. This parameter may be specified as
a compute target object or the string name of a compute target on the workspace.
Optionally if the compute target is not available at pipeline creation time, you may specify a tuple of
('compute target name', 'compute target type') to avoid fetching the compute target object (AmlCompute
type is 'AmlCompute' and RemoteCompute type is 'VirtualMachine').
- runconfig
- ScriptRunConfig or RunConfiguration
The optional configuration object which encapsulates the information necessary to submit a training run in an experiment.
- runconfig_pipeline_params
- <xref:<xref:{str: PipelineParameter}>>
Overrides of runconfig properties at runtime using key-value pairs each with name of the runconfig property and PipelineParameter for that property.
Supported values: 'NodeCount', 'MpiProcessCountPerNode', 'TensorflowWorkerCount', 'TensorflowParameterServerCount'
- inputs
- list[InputPortBinding or DataReference or PortDataReference or PipelineData or <xref:azureml.pipeline.core.pipeline_output_dataset.PipelineOutputDataset> or DatasetConsumptionConfig]
A list of input port bindings.
- outputs
- list[PipelineData or OutputDatasetConfig or PipelineOutputAbstractDataset or OutputPortBinding]
A list of output port bindings.
- params
- dict
A dictionary of name-value pairs registered as environment variables with "AML_PARAMETER_".
- source_directory
- str
A folder that contains scripts, conda env, and other resources used in the step.
- allow_reuse
- bool
Indicates whether the step should reuse previous results when re-run with the same settings. Reuse is enabled by default. If the step contents (scripts/dependencies) as well as inputs and parameters remain unchanged, the output from the previous run of this step is reused. When reusing the step, instead of submitting the job to compute, the results from the previous run are immediately made available to any subsequent steps. If you use Azure Machine Learning datasets as inputs, reuse is determined by whether the dataset's definition has changed, not by whether the underlying data has changed.
Remarks
An CommandStep is a basic, built-in step to run a command on the given compute target. It takes a command as a parameter or from other parameters like runconfig. It also takes other optional parameters like compute target, inputs and outputs. You should use a ScriptRunConfig or RunConfiguration to specify requirements for the CommandStep, such as custom docker image.
The best practice for working with CommandStep is to use a separate folder for the executable or
script to run any dependent files associated with the step, and specify that folder with the
source_directory
parameter. Following this best practice has two benefits. First, it helps reduce
the size of the snapshot created for the step because only what is needed for the step is snapshotted.
Second, the step's output from a previous run can be reused if there are no changes to the
source_directory
that would trigger a re-upload of the snapshot.
For the system-known commands source_directory
is not required but you can still provide it with
any dependent files associated with the step.
The following code example shows how to use a CommandStep in a machine learning training scenario. To list files in linux:
from azureml.pipeline.steps import CommandStep
trainStep = CommandStep(name='list step',
command='ls -lrt',
compute_target=compute_target)
To run a python script:
from azureml.pipeline.steps import CommandStep
trainStep = CommandStep(name='train step',
command='python train.py arg1 arg2',
source_directory=project_folder,
compute_target=compute_target)
To run a python script via ScriptRunConfig:
from azureml.core import ScriptRunConfig
from azureml.pipeline.steps import CommandStep
train_src = ScriptRunConfig(source_directory=script_folder,
command='python train.py arg1 arg2',
environment=my_env)
trainStep = CommandStep(name='train step',
runconfig=train_src)
See https://aka.ms/pl-first-pipeline for more details on creating pipelines in general.
Methods
create_node |
Create a node for CommandStep and add it to the specified graph. This method is not intended to be used directly. When a pipeline is instantiated with this step, Azure ML automatically passes the parameters required through this method so that step can be added to a pipeline graph that represents the workflow. |
create_node
Create a node for CommandStep and add it to the specified graph.
This method is not intended to be used directly. When a pipeline is instantiated with this step, Azure ML automatically passes the parameters required through this method so that step can be added to a pipeline graph that represents the workflow.
create_node(graph, default_datastore, context)
Parameters
- default_datastore
- AbstractAzureStorageDatastore or AzureDataLakeDatastore
The default datastore.
- context
- <xref:_GraphContext>
The graph context.
Returns
The created node.
Return type
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for