steps Package

Contains pre-built steps that can be executed in an Azure Machine Learning Pipeline.

Azure ML Pipeline steps can be configured together to construct a Pipeline, which represents a shareable and reusable Azure Machine Learning workflow. Each step of a pipeline can be configured to allow reuse of its previous run results if the step contents (scripts and dependencies) as well as inputs and parameters remain unchanged.

The classes in this package are typically used together with the classes in the core package. The core package contains classes for configuring data (PipelineData), scheduling (Schedule), and managing the output of steps (StepRun).

The pre-built steps in this package cover many common scenarios encountered in machine learning workflows. To get started with pre-built pipeline steps, see:

Modules

adla_step

Contains functionality to create an Azure ML Pipeline step to run a U-SQL script with Azure Data Lake Analytics.

automl_step

Contains functionality for adding and managing an automated ML pipeline step in Azure Machine Learning.

azurebatch_step

Contains functionality to create an Azure ML Pipeline step that runs a Windows executable in Azure Batch.

command_step

Contains functionality to create an Azure ML Pipeline step that runs commands.

data_transfer_step

Contains functionality to create an Azure ML Pipeline step that transfers data between storage options.

databricks_step

Contains functionality to create an Azure ML pipeline step to run a Databricks notebook or Python script on DBFS.

estimator_step

Contains functionality to create a pipeline step that runs an Estimator for Machine Learning model training.

hyper_drive_step

Contains funtionality for creating and managing Azure ML Pipeline steps that run hyperparameter tuning.

kusto_step

Contains functionality to create an Azure ML pipeline step to run a Kusto notebook.

module_step

Contains functionality to add an Azure Machine Learning Pipeline step using an existing version of a Module.

mpi_step

Contains functionality to add a Azure ML Pipeline step to run an MPI job for Machine Learning model training.

parallel_run_config

Contains functionality for configuring a ParallelRunStep.

parallel_run_step

Contains functionality to add a step to run user script in parallel mode on multiple AmlCompute targets.

python_script_step

Contains functionality to create an Azure ML Pipeline step that runs Python script.

r_script_step

Contains functionality to create an Azure ML Pipeline step that runs R script.

synapse_spark_step

Contains functionality to create an Azure ML Synapse step that runs Python script.

Classes

AdlaStep

Creates an Azure ML Pipeline step to run a U-SQL script with Azure Data Lake Analytics.

For an example of using this AdlaStep, see the notebook https://aka.ms/pl-adla.

Create an Azure ML Pipeline step to run a U-SQL script with Azure Data Lake Analytics.

AutoMLStep

Creates an Azure ML Pipeline step that encapsulates an automated ML run.

For an example of using AutoMLStep, see the notebook https://aka.ms/pl-automl.

Initialize an AutoMLStep.

AutoMLStepRun

Provides information about an automated ML experiment run and methods for retrieving default outputs.

The AutoMLStepRun class is used to manage, check status, and retrieve run details once an automated ML run is submitted in a pipeline. In addition, this class can be used to get the default outputs of the AutoMLStep via the StepRun class.

Initialize a automl step run.

AzureBatchStep

Creates an Azure ML Pipeline step for submitting jobs to Azure Batch.

Note: This step does not support upload/download of directories and their contents.

For an example of using AzureBatchStep, see the notebook https://aka.ms/pl-azbatch.

Create an Azure ML Pipeline step for submitting jobs to Azure Batch.

CommandStep

Create an Azure ML Pipeline step that runs a command.

Create an Azure ML Pipeline step that runs a command.

DataTransferStep

Creates an Azure ML Pipeline step that transfers data between storage options.

DataTransferStep supports common storage types such as Azure Blob Storage and Azure Data Lake as sources and sinks. For more more information, see the Remarks section.

For an example of using DataTransferStep, see the notebook https://aka.ms/pl-data-trans.

Create an Azure ML Pipeline step that transfers data between storage options.

DatabricksStep

Creates an Azure ML Pipeline step to add a DataBricks notebook, Python script, or JAR as a node.

For an example of using DatabricksStep, see the notebook https://aka.ms/pl-databricks.

Create an Azure ML Pipeline step to add a DataBricks notebook, Python script, or JAR as a node.

For an example of using DatabricksStep, see the notebook https://aka.ms/pl-databricks.

:param python_script_name:[Required] The name of a Python script relative to source_directory. If the script takes inputs and outputs, those will be passed to the script as parameters. If python_script_name is specified then source_directory must be too.

Specify exactly one of notebook_path, python_script_path, python_script_name, or main_class_name.

If you specify a DataReference object as input with data_reference_name=input1 and a PipelineData object as output with name=output1, then the inputs and outputs will be passed to the script as parameters. This is how they will look like and you will need to parse the arguments in your script to access the paths of each input and output: "-input1","wasbs://test@storagename.blob.core.windows.net/test","-output1", "wasbs://test@storagename.blob.core.windows.net/b3e26de1-87a4-494d-a20f-1988d22b81a2/output1"

In addition, the following parameters will be available within the script:

  • AZUREML_RUN_TOKEN: The AML token for authenticating with Azure Machine Learning.
  • AZUREML_RUN_TOKEN_EXPIRY: The AML token expiry time.
  • AZUREML_RUN_ID: Azure Machine Learning Run ID for this run.
  • AZUREML_ARM_SUBSCRIPTION: Azure subscription for your AML workspace.
  • AZUREML_ARM_RESOURCEGROUP: Azure resource group for your Azure Machine Learning workspace.
  • AZUREML_ARM_WORKSPACE_NAME: Name of your Azure Machine Learning workspace.
  • AZUREML_ARM_PROJECT_NAME: Name of your Azure Machine Learning experiment.
  • AZUREML_SERVICE_ENDPOINT: The endpoint URL for AML services.
  • AZUREML_WORKSPACE_ID: ID of your Azure Machine Learning workspace.
  • AZUREML_EXPERIMENT_ID: ID of your Azure Machine Learning experiment.
  • AZUREML_SCRIPT_DIRECTORY_NAME: Directory path in DBFS where source_directory has been copied.
  (This parameter is only populated when `python_script_name` is used.  See more details below.)

When you are executing a Python script from your local machine on Databricks using DatabricksStep parameters source_directory and python_script_name, your source_directory is copied over to DBFS and the directory path on DBFS is passed as a parameter to your script when it begins execution. This parameter is labelled as –AZUREML_SCRIPT_DIRECTORY_NAME. You need to prefix it with the string "dbfs:/" or "/dbfs/" to access the directory in DBFS.

EstimatorStep

DEPRECATED. Creates a pipeline step to run Estimator for Azure ML model training.

Create an Azure ML Pipeline step to run Estimator for Machine Learning model training.

DEPRECATED. Use the CommandStep instead. For an example see How to run ML training in pipelines with CommandStep.

HyperDriveStep

Creates an Azure ML Pipeline step to run hyperparameter tunning for Machine Learning model training.

For an example of using HyperDriveStep, see the notebook https://aka.ms/pl-hyperdrive.

Create an Azure ML Pipeline step to run hyperparameter tunning for Machine Learning model training.

HyperDriveStepRun

Manage, check status, and retrieve run details for a HyperDriveStep pipeline step.

HyperDriveStepRun provides the functionality of HyperDriveRun with the additional support of StepRun. The HyperDriveStepRun class enables you to manage, check status, and retrieve run details for the HyperDrive run and each of its generated child runs. The StepRun class enables you to do this once the parent pipeline run is submitted and the pipeline has submitted the step run.

Initialize a HyperDriveStepRun.

HyperDriveStepRun provides the functionality of HyperDriveRun with the additional support of StepRun. The HyperDriveRun class enables you to manage, check status, and retrieve run details for the HyperDrive run and each of its generated child runs. The StepRun class enables you to do this once the parent pipeline run is submitted and the pipeline has submitted the step run.

KustoStep

KustoStep enables the functionality of running Kusto queries on a target Kusto cluster in Azure ML Pipelines.

Initialize KustoStep.

ModuleStep

Creates an Azure Machine Learning pipeline step to run a specific version of a Module.

Module objects define reusable computations, such as scripts or executables, that can be used in different machine learning scenarios and by different users. To use a specific version of a Module in a pipeline create a ModuleStep. A ModuleStep is a step in pipeline that uses an existing ModuleVersion.

For an example of using ModuleStep, see the notebook https://aka.ms/pl-modulestep.

Create an Azure ML pipeline step to run a specific version of a Module.

MpiStep

Creates an Azure ML pipeline step to run an MPI job.

For an example of using MpiStep, see the notebook https://aka.ms/pl-style-trans.

Create an Azure ML pipeline step to run an MPI job.

DEPRECATED. Use the CommandStep instead. For an example see How to run distributed training in pipelines with CommandStep.

ParallelRunConfig

Defines configuration for a ParallelRunStep object.

For an example of using ParallelRunStep, see the notebook https://aka.ms/batch-inference-notebooks.

For troubleshooting guide, see https://aka.ms/prstsg. You can find more references there.

Initialize the config object.

ParallelRunStep

Creates an Azure Machine Learning Pipeline step to process large amounts of data asynchronously and in parallel.

For an example of using ParallelRunStep, see the notebook https://aka.ms/batch-inference-notebooks.

For troubleshooting guide, see https://aka.ms/prstsg. You can find more references there.

Create an Azure ML Pipeline step to process large amounts of data asynchronously and in parallel.

For an example of using ParallelRunStep, see the notebook link https://aka.ms/batch-inference-notebooks.

PythonScriptStep

Creates an Azure ML Pipeline step that runs Python script.

For an example of using PythonScriptStep, see the notebook https://aka.ms/pl-get-started.

Create an Azure ML Pipeline step that runs Python script.

RScriptStep

Note

This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.

Creates an Azure ML Pipeline step that runs R script.

Create an Azure ML Pipeline step that runs R script.

DEPRECATED. Use the CommandStep instead. For an example see How to run R scripts in pipelines with CommandStep.

SynapseSparkStep

Note

This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.

Creates an Azure ML Synapse step that submit and execute Python script.

Create an Azure ML Pipeline step that runs spark job on synapse spark pool.