RScriptStep Class

Reference

Note

This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.

Creates an Azure ML Pipeline step that runs R script.

Create an Azure ML Pipeline step that runs R script.

DEPRECATED. Use the CommandStep instead. For an example see How to run R scripts in pipelines with CommandStep.

Inheritance: azureml.pipeline.core._python_script_step_base._PythonScriptStepBase

RScriptStep

Constructor

RScriptStep(script_name, name=None, arguments=None, compute_target=None, runconfig=None, runconfig_pipeline_params=None, inputs=None, outputs=None, params=None, source_directory=None, use_gpu=False, custom_docker_image=None, cran_packages=None, github_packages=None, custom_url_packages=None, allow_reuse=True, version=None)

Parameters

Name	Description
script_name Required	str [Required] The name of a R script relative to `source_directory`.
name Required	str The name of the step. If unspecified, `script_name` is used.
arguments Required	list Command line arguments for the R script file. The arguments will be passed to compute via the `arguments` parameter in RunConfiguration. For more details how to handle arguments such as special symbols, see the RunConfiguration.
compute_target Required	Union[DsvmCompute, AmlCompute, RemoteCompute, HDInsightCompute, str, tuple] [Required] The compute target to use. If unspecified, the target from the `runconfig` is used. This parameter may be specified as a compute target object or the string name of a compute target on the workspace. Optionally if the compute target is not available at pipeline creation time, you may specify a tuple of ('compute target name', 'compute target type') to avoid fetching the compute target object (AmlCompute type is 'AmlCompute' and RemoteCompute type is 'VirtualMachine').
runconfig Required	RunConfiguration [Required] Run configuration which encapsulates the information necessary to submit a training run in an experiment. This is required to define R run configs which can be defined in RSection. The RSection is required for this step.
runconfig_pipeline_params Required	dict[str, PipelineParameter] Overrides of runconfig properties at runtime using key-value pairs each with name of the runconfig property and PipelineParameter for that property. Supported values: 'NodeCount', 'MpiProcessCountPerNode', 'TensorflowWorkerCount', 'TensorflowParameterServerCount'
inputs Required	list[Union[InputPortBinding, DataReference, PortDataReference, PipelineData, PipelineOutputFileDataset, PipelineOutputTabularDataset, DatasetConsumptionConfig]] A list of input port bindings.
outputs Required	list[Union[PipelineData, OutputDatasetConfig, PipelineOutputAbstractDataset, OutputPortBinding]] A list of output port bindings.
params Required	dict A dictionary of name-value pairs registered as environment variables with "AML_PARAMETER_".
source_directory Required	str A folder that contains R script, conda env, and other resources used in the step.
use_gpu Required	bool Indicates whether the environment to run the experiment should support GPUs. If True, a GPU-based default Docker image will be used in the environment. If False, a CPU-based image will be used. Default docker images (CPU or GPU) will be used only if a user doesn't set both `base_image` and `base_dockerfile` parameters. This setting is used only in Docker-enabled compute targets. See https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.environment.dockersection for more details on `base_image`.
custom_docker_image Required	str The name of the Docker image from which the image to use for training will be built. If not set, a default CPU-based image will be used as the base image. This has been deprecated and will be removed in a future release. Please use base_image in the DockerSection instead.
cran_packages Required	list CRAN packages to be installed. This has been deprecated and will be removed in a future release. Please use RSection.cran_packages instead.
github_packages Required	list GitHub packages to be installed. This has been deprecated and will be removed in a future release. Please use RSection.github_packages instead.
custom_url_packages Required	list Packages to be installed from local, directory or custom URL. This has been deprecated and will be removed in a future release. Please use RSection.custom_url_packages instead.
allow_reuse Required	bool Indicates whether the step should reuse previous results when re-run with the same settings. Reuse is enabled by default. If the step contents (scripts/dependencies) as well as inputs and parameters remain unchanged, the output from the previous run of this step is reused. When reusing the step, instead of submitting the job to compute, the results from the previous run are immediately made available to any subsequent steps. If you use Azure Machine Learning datasets as inputs, reuse is determined by whether the dataset's definition has changed, not by whether the underlying data has changed.
version Required	str An optional version tag to denote a change in functionality for the step.
script_name Required	str [Required] The name of a R script relative to `source_directory`.
name Required	str The name of the step. If unspecified, `script_name` is used.
arguments Required	list Command line arguments for the R script file. The arguments will be passed to compute via the `arguments` parameter in RunConfiguration. For more details how to handle arguments such as special symbols, see the RunConfiguration.
compute_target Required	Union[DsvmCompute, AmlCompute, RemoteCompute, HDInsightCompute, str, tuple] [Required] The compute target to use. If unspecified, the target from the `runconfig` will be used. This parameter may be specified as a compute target object or the string name of a compute target on the workspace. Optionally if the compute target is not available at pipeline creation time, you may specify a tuple of ('compute target name', 'compute target type') to avoid fetching the compute target object (AmlCompute type is 'AmlCompute' and RemoteCompute type is 'VirtualMachine').
runconfig Required	RunConfiguration [Required] Run configuration which encapsulates the information necessary to submit a training run in an experiment. This is required to define R run configs which can be defined in RSection. The RSection is required for this step.
runconfig_pipeline_params Required	dict[str, PipelineParameter] Overrides of runconfig properties at runtime using key-value pairs each with name of the runconfig property and PipelineParameter for that property. Supported values: 'NodeCount', 'MpiProcessCountPerNode', 'TensorflowWorkerCount', 'TensorflowParameterServerCount'
inputs Required	list[Union[InputPortBinding, DataReference, PortDataReference, PipelineData, PipelineOutputFileDataset, PipelineOutputTabularDataset, DatasetConsumptionConfig]] A list of input port bindings.
outputs Required	list[Union[PipelineData, PipelineOutputAbstractDataset, OutputPortBinding]] A list of output port bindings.
params Required	dict A dictionary of name-value pairs registered as environment variables with "AML_PARAMETER_".
source_directory Required	str A folder that contains R script, conda env, and other resources used in the step.
use_gpu Required	bool Indicates whether the environment to run the experiment should support GPUs. If True, a GPU-based default Docker image will be used in the environment. If False, a CPU-based image will be used. Default docker images (CPU or GPU) will be used only if a user doesn't set both `base_image` and `base_dockerfile` parameters. This setting is used only in Docker-enabled compute targets. See https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.environment.dockersection for more details on `base_image`.
custom_docker_image Required	str The name of the Docker image from which the image to use for training will be built. If not set, a default CPU-based image will be used as the base image. This has been deprecated and will be removed in a future release. Please use base_image in the DockerSection instead.
cran_packages Required	list CRAN packages to be installed. This has been deprecated and will be removed in a future release. Please use RSection.cran_packages instead.
github_packages Required	list GitHub packages to be installed. This has been deprecated and will be removed in a future release. Please use RSection.github_packages instead.
custom_url_packages Required	list Packages to be installed from local, directory or custom URL. This has been deprecated and will be removed in a future release. Please use RSection.custom_url_packages instead.
allow_reuse Required	bool Indicates whether the step should reuse previous results when re-run with the same settings. Reuse is enabled by default. If the step contents (scripts/dependencies) as well as inputs and parameters remain unchanged, the output from the previous run of this step is reused. When reusing the step, instead of submitting the job to compute, the results from the previous run are immediately made available to any subsequent steps. If you use Azure Machine Learning datasets as inputs, reuse is determined by whether the dataset's definition has changed, not by whether the underlying data has changed.
version Required	str An optional version tag to denote a change in functionality for the step.

Remarks

An RScriptStep is a basic, built-in step to run R script on a compute target. It takes a script name and other optional parameters like arguments for the script, compute target, inputs and outputs. You should use a RunConfiguration to specify requirements for the RScriptStep, such as custom docker image, required cran/github packages.

The best practice for working with RScriptStep is to use a separate folder for scripts and any dependent files associated with the step, and specify that folder with the source_directory parameter. Following this best practice has two benefits. First, it helps reduce the size of the snapshot created for the step because only what is needed for the step is snapshotted. Second, the step's output from a previous run can be reused if there are no changes to the source_directory that would trigger a re-upload of the snapshot.

The following code example shows how to use a RScriptStep in a machine learning training scenario.


   from azureml.core.runconfig import RunConfiguration
   from azureml.core.environment import Environment, RSection, RCranPackage
   from azureml.pipeline.steps import RScriptStep

   rc = RunConfiguration()
   rc.framework='R'
   rc.environment.r = RSection()                            # R details with required packages
   rc.environment.docker.enabled = True                     # to enable docker image
   rc.environment.docker.base_image = '<custom user image>' # to use custom image

   cran_package1 = RCranPackage()
   cran_package1.name = "ggplot2"
   cran_package1.repository = "www.customurl.com"
   cran_package1.version = "2.1"
   rc.environment.r.cran_packages = [cran_package1]

   trainStep = RScriptStep(script_name="train.R",
                           arguments=["--input", blob_input_data, "--output", output_data1],
                           inputs=[blob_input_data],
                           outputs=[output_data1],
                           compute_target=compute_target,
                           use_gpu=False,
                           runconfig=rc,
                           source_directory=project_folder)

See https://aka.ms/pl-first-pipeline for more details on creating pipelines in general. See https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.environment.rsection for more details on RSection.

Methods

create_node

Create a node for RScriptStep and add it to the specified graph.

DEPRECATED. Use the CommandStep instead. For an example see How to run R scripts in pipelines with CommandStep.

This method is not intended to be used directly. When a pipeline is instantiated with this step, Azure ML automatically passes the parameters required through this method so that step can be added to a pipeline graph that represents the workflow.

create_node

Create a node for RScriptStep and add it to the specified graph.

DEPRECATED. Use the CommandStep instead. For an example see How to run R scripts in pipelines with CommandStep.

create_node(graph, default_datastore, context)

Parameters

Name	Description
graph Required	Graph The graph object to add the node to.
default_datastore Required	Union[AbstractAzureStorageDatastore, AzureDataLakeDatastore] The default datastore.
context Required	<xref:azureml.pipeline.core._GraphContext> The graph context.

Returns

Type	Description
Node	The created node.

Share via

RScriptStep Class

Constructor

Parameters

Remarks

Methods

create_node

Parameters

Returns

Feedback

Additional resources