core Package
Contains core functionality for Azure Machine Learning pipelines, which are configurable machine learning workflows.
Azure Machine Learning pipelines allow you to create resusable machine learning workflows that can be used as a template for your machine learning scenarios. This package contains the core functionality for working with Azure ML pipelines and is typically used along with the classes in the steps package.
A machine learning pipeline is represented by a collection of PipelineStep objects that can sequenced and parallelized, or be created with explicit dependencies between steps. Pipeline steps are used to define a Pipeline object which represents the workflow to execute. You can create and work with pipelines in a Jupyter Notebook or any other IDE with the Azure ML SDK installed.
Azure ML pipelines enable you to focus on machine learning rather than infrastructure. To get started building a pipeline, see https://aka.ms/pl-first-pipeline.
For more information about the benefits of the Machine Learning Pipeline and how it is related to other pipelines offered by Azure, see What are ML pipelines in Azure Machine Learning service?
Modules
builder |
Defines classes for building a Azure Machine Learning pipeline. A pipeline graph is composed of pipeline steps (PipelineStep), optional pipeline data (PipelineData) produced or consumed in each step, and an optional step execution sequence (StepSequence). |
graph |
Defines classes for constructing Azure Machine Learning pipeline graphs. Azure ML pipeline graphs are created for Pipeline objects, when you use PipelineStep (and derived classes), PipelineData, and PipelineData objects. In typical use cases, you will not need to directly use the classes in this module. A pipeline run graph consists of module nodes which represent basic units such as a datasource or step. Nodes can have input ports and output ports, and associated parameters. Edges define relationships between two node ports in a graph. |
module |
Contains classes for creating and managing resusable computational units of an Azure Machine Learning pipeline. Modules allow you to create computational units in a Pipeline, which can have inputs, outputs, and rely on parameters and an environment configuration to operate. A module can be versioned and be used in different Azure Machine Learning pipelines unlike PipelineStep (and derived classes) which are used in one Pipeline. Modules are designed to be reused in several pipelines and can evolve to adapt a specific computation logic for different use cases. A step in a pipeline can be used in fast iterations to improve an algorithm, and once the goal is achieved, the algorithm is usually published as a module to enable reuse. |
module_step_base |
Contains functionality to add a step to a pipeline using a version of a Module. |
pipeline |
Defines the class for creating reusable Azure Machine Learning workflows. |
pipeline_draft |
Defines classes for managing mutable pipelines. |
pipeline_endpoint |
Defines classes for managing pipelines including versioning and endpoints. |
pipeline_output_dataset |
Contains functionality for promoting an intermediate output to an Azure Machine Learning Dataset. Intermediate data (output) in a pipeline by default will not become an Azure Machine Learning Dataset. To promote intermediate data to an Azure Machine Learning Dataset, call the as_dataset method on the PipelineData class to return a PipelineOutputFileDataset object. From a PipelineOutputFileDataset object, you can then create an PipelineOutputTabularDataset object. |
run |
Defines classes for submitted pipelines, including classes for checking status and retrieving run details. |
schedule |
Defines classes for scheduling submissions of Azure Machine Learning Pipelines. |
Classes
InputPortBinding |
Defines a binding from a source to an input of a pipeline step. An InputPortBinding can be used as an input to a step. The source can be a PipelineData, PortDataReference, DataReference, PipelineDataset, or OutputPortBinding. InputPortBinding is useful to specify the name of the step input, if it should be different than the name of the bind object (i.e. to avoid duplicate input/output names or because the step script needs an input to have a certain name). It can also be used to specify the bind_mode for PythonScriptStep inputs. Initialize InputPortBinding. |
Module |
Represents a computation unit used in an Azure Machine Learning pipeline. A module is a collection of files which will run on a compute target and a description of an interface. The collection of files can be script, binaries, or any other files required to execute on the compute target. The module interface describes inputs, outputs, and parameter definitions. It doesn't bind them to specific values or data. A module has a snapshot associated with it, which captures the collection of files defined for the module. Initialize Module. |
ModuleVersion |
Represents the actual computation unit within a Module. You should not use this class directly. Instead, use one of the publish methods of the Module class. Initialize ModuleVersion. |
ModuleVersionDescriptor |
Defines the version and ID of a ModuleVersion. Initialize ModuleVersionDescriptor. |
OutputPortBinding |
Defines a named output of a pipeline step. OutputPortBinding can be used to specify the type of data which will be produced by a step and how the data will be produced. It can be used with InputPortBinding to specify that the step output is a required input of another step. Initialize OutputPortBinding. |
Pipeline |
Represents a collection of steps which can be executed as a reusable Azure Machine Learning workflow. Use a Pipeline to create and manage workflows that stitch together various machine learning phases. Each machine learning phase, such as data preparation and model training, can consist of one or more steps in a Pipeline. For an overview of why and when to use Pipelines, see https://aka.ms/pl-concept. For an overview on constructing a Pipeline, see https://aka.ms/pl-first-pipeline. Initialize Pipeline. |
PipelineData |
Represents intermediate data in an Azure Machine Learning pipeline. Data used in pipeline can be produced by one step and consumed in another step by providing a PipelineData object as an output of one step and an input of one or more subsequent steps. Note if you are using the pipeline data, please make sure the directory used existed. A python example to ensure the directory existed, suppose you have a output port named output_folder in one pipeline step, you want to write some data to relative path in this folder.
PipelineData use DataReference underlying which is no longer the recommended approach for data access and delivery, please use OutputFileDatasetConfig instead, you can find sample here: Pipeline using OutputFileDatasetConfig. Initialize PipelineData. |
PipelineDataset |
Acts as an adapter for Dataset and Pipeline. Note This class is deprecated. Learn how to use dataset with pipeline, see https://aka.ms/pipeline-with-dataset. This is an internal class. You should not create this class directly but rather call the as_* instance methods on the Dataset or the OutputDatasetConfig classes. Act as an adapter for Dataset and Pipeline. This is an internal class. You should not create this class directly but rather call the as_* instance methods on the Dataset or the OutputDatasetConfig classes. |
PipelineDraft |
Represents a mutable pipeline which can be used to submit runs and create Published Pipelines. Use PipelineDrafts to iterate on Pipelines. PipelineDrafts can be created from scratch, another PipelineDraft, or existing pipelines: Pipeline, PublishedPipeline, or PipelineRun. Initialize PipelineDraft. |
PipelineEndpoint |
Represents a Pipeline workflow that can be triggered from a unique endpoint URL. PipelineEndpoints can be used to create new versions of a PublishedPipeline while maintaining the same endpoint. PipelineEndpoints are uniquely named within a workspace. Using the endpoint attribute of a PipelineEndpoint object, you can trigger new pipeline runs from external applications with REST calls. For information about how to authenticate when calling REST endpoints, see https://aka.ms/pl-restep-auth. For more information about creating and running machine learning pipelines, see https://aka.ms/pl-first-pipeline. Initialize PipelineEndpoint. |
PipelineParameter |
Defines a parameter in a pipeline execution. Use PipelineParameters to construct versatile Pipelines which can be resubmitted later with varying parameter values. Initialize pipeline parameters. |
PipelineRun |
Represents a run of a Pipeline. This class can be used to manage, check status, and retrieve run details once a pipeline run is submitted. Use get_steps to retrieve the StepRun objects which are created by the pipeline run. Other uses include retrieving the Graph object associated with the pipeline run, fetching the status of the pipeline run, and waiting for run completion. Initialize a Pipeline run. |
PipelineStep |
Represents an execution step in an Azure Machine Learning pipeline. Pipelines are constructed from multiple pipeline steps, which are distinct computational units in the pipeline. Each step can run independently and use isolated compute resources. Each step typically has its own named inputs, outputs, and parameters. The PipelineStep class is the base class from which other built-in step classes designed for common scenarios inherit, such as PythonScriptStep, DataTransferStep, and HyperDriveStep. For an overview of how Pipelines and PipelineSteps relate, see What are ML Pipelines. Initialize PipelineStep. |
PortDataReference |
Models data associated with an output of a completed StepRun. A PortDataReference object can be used to download the output data which was produced by a StepRun. It can also be used as an step input in a future Pipeline. Initialize PortDataReference. |
PublishedPipeline |
Represents a Pipeline to be submitted without the Python code which constructed it. In addition, a PublishedPipeline can be used to resubmit a Pipeline with different PipelineParameter values and inputs. Initialize PublishedPipeline. :param endpoint The REST endpoint URL to submit pipeline runs for this pipeline. :type endpoint: str :param total_run_steps: The number of steps in this pipeline :type total_run_steps: int :param workspace: The workspace of the published pipeline. :type workspace: azureml.core.Workspace :param continue_on_step_failure: Whether to continue execution of other steps in the PipelineRun if a step fails, default is false. |
Schedule |
Defines a schedule on which to submit a pipeline. Once a Pipeline is published, a Schedule can be used to submit the Pipeline at a specified interval or when changes to a Blob storage location are detected. Initialize Schedule. |
ScheduleRecurrence |
Defines the frequency, interval and start time of a pipeline Schedule. ScheduleRecurrence also allows you to specify the time zone and the hours or minutes or week days for the recurrence. Initialize a schedule recurrence. It also allows to specify the time zone and the hours or minutes or week days for the recurrence. |
StepRun |
A run of a step in a Pipeline. This class can be used to manage, check status, and retrieve run details once the parent pipeline run is submitted and the pipeline has submitted the step run. Initialize a StepRun. |
StepRunOutput |
Represents an output created by a StepRun in a Pipeline. StepRunOutput can be used to access the PortDataReference created by the step. Initialize StepRunOutput. |
StepSequence |
Represents a list of steps in a Pipeline and the order in which to execute them. Use a StepSequence when initializing a pipeline to create a workflow that contains steps to run in a specific order. Initialize StepSequence. |
TrainingOutput |
Defines a specialized output of certain PipelineSteps for use in a pipeline. TrainingOutput enables an automated machine learning metric or model to be made available as a step output to be consumed by another step in an Azure Machine Learning Pipeline. Can be used with AutoMLStep or HyperDriveStep. Initialize TrainingOutput. param model_file: The specific model file to be included in the output. For HyperDriveStep only. |
Enums
TimeZone |
Enumerates the valid time zones for a recurrence Schedule. |