Module Class

Represents a computation unit used in an Azure Machine Learning pipeline.

A module is a collection of files which will run on a compute target and a description of an interface. The collection of files can be script, binaries, or any other files required to execute on the compute target. The module interface describes inputs, outputs, and parameter definitions. It doesn't bind them to specific values or data. A module has a snapshot associated with it, which captures the collection of files defined for the module.

Initialize Module.

Inheritance
builtins.object
Module

Constructor

Module(workspace, module_id, name, description, status, default_version, module_version_list, _module_provider=None, _module_version_provider=None)

Parameters

Name Description
workspace
Required

The workspace object this Module belongs to.

module_id
Required
str

The ID of the Module.

name
Required
str

The name of the Module.

description
Required
str

The description of the Module.

status
Required
str

The new status of the Module: 'Active', 'Deprecated', or 'Disabled'.

default_version
Required
str

The default version of the Module.

module_version_list
Required

A list of ModuleVersionDescriptor objects.

_module_provider
<xref:azureml.pipeline.core._aeva_provider._AzureMLModuleProvider>

(Internal use only.) The Module provider.

Default value: None
_module_version_provider
<xref:azureml.pipeline.core._aeva_provider._AevaMlModuleVersionProvider>

(Internal use only.) The ModuleVersion provider.

Default value: None
workspace
Required

The workspace object this Module belongs to.

module_id
Required
str

The ID of the Module.

name
Required
str

The name of the Module.

description
Required
str

The description of the Module.

status
Required
str

The new status of the Module: 'Active', 'Deprecated', or 'Disabled'.

default_version
Required
str

The default version of the Module.

module_version_list
Required

A list of ModuleVersionDescriptor objects.

_module_provider
Required
<xref:<xref:_AevaMlModuleProvider object>>

The Module provider.

_module_version_provider
Required
<xref:azureml.pipeline.core._aeva_provider._AevaMlModuleVersionProvider>

The ModuleVersion provider.

Remarks

A Module acts as a container of its versions. In the following example, a ModuleVersion is created from the publish_python_script method and has two inputs and two outputs. The create ModuleVersion is the default version (is_default is set to True).


   out_sum = OutputPortDef(name="out_sum", default_datastore_name=datastore.name, default_datastore_mode="mount",
                           label="Sum of two numbers")
   out_prod = OutputPortDef(name="out_prod", default_datastore_name=datastore.name, default_datastore_mode="mount",
                            label="Product of two numbers")
   entry_version = module.publish_python_script("calculate.py", "initial",
                                                inputs=[], outputs=[out_sum, out_prod], params = {"initialNum":12},
                                                version="1", source_directory="./calc")

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb

This module can be used when defining a pipeline, in different steps, by using a ModuleStep.

The following sample shows how to wire the data used in the pipeline to inputs and outputs of a ModuleVersion using PipelineData:


   middle_step_input_wiring = {"in1":first_sum, "in2":first_prod}
   middle_sum = PipelineData("middle_sum", datastore=datastore, output_mode="mount",is_directory=False)
   middle_prod = PipelineData("middle_prod", datastore=datastore, output_mode="mount",is_directory=False)
   middle_step_output_wiring = {"out_sum":middle_sum, "out_prod":middle_prod}

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb

The mapping can then be used when creating the ModuleStep:


   middle_step = ModuleStep(module=module,
                            inputs_map= middle_step_input_wiring,
                            outputs_map= middle_step_output_wiring,
                            runconfig=RunConfiguration(), compute_target=aml_compute,
                            arguments = ["--file_num1", first_sum, "--file_num2", first_prod,
                                         "--output_sum", middle_sum, "--output_product", middle_prod])

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb

The resolution of which version of the module to use happens upon submission, and follows the following process:

  • Remove all disabled versions
  • If a specific version was stated, use that, else
  • If a default version was defined to the Module, use that, else
  • If all versions follow semantic versioning without letters, take the highest value, else
  • Take the version of the Module that was updated last

Note that because a node's inputs and outputs mapping to a module's input and output is defined upon Pipeline creation, if the resolved version upon submission has a different interface from the one that is resolved upon pipeline creation, then the pipeline submission will fail.

The underlying module can be updated with new versions while keeping the default version the same.

Modules are uniquely named within a workspace.

Methods

create

Create the Module.

deprecate

Set the Module to 'Deprecated'.

disable

Set the Module to 'Disabled'.

enable

Set the Module to 'Active'.

get

Get the Module by name or by ID; throws an exception if either is not provided.

get_default

Get the default module version.

get_default_version

Get the default version of Module.

get_versions

Get all the versions of the Module.

module_def_builder

Create the module definition object that describes the step.

module_version_list

Get the Module version list.

process_source_directory

Process source directory for the step and check that the script exists.

publish

Create a ModuleVersion and add it to the current Module.

publish_adla_script

Create a ModuleVersion based on Azure Data Lake Analytics (ADLA) and add it to the current Module.

publish_azure_batch

Create a ModuleVersion that uses Azure batch and add it to the current Module.

publish_python_script

Create a ModuleVersion that's based on a Python script and add it to the current Module.

resolve

Resolve and return the right ModuleVersion.

set_default_version

Set the default ModuleVersion of the Module.

set_description

Set the description of Module.

set_name

Set the name of Module.

create

Create the Module.

static create(workspace, name, description, _workflow_provider=None)

Parameters

Name Description
workspace
Required

The workspace in which to create the Module.

name
Required
str

The name of the Module.

description
Required
str

The description of the Module.

_workflow_provider
<xref:azureml.pipeline.core._aeva_provider._AevaWorkflowProvider>

(Internal use only.) The workflow provider.

Default value: None

Returns

Type Description

Module object

deprecate

Set the Module to 'Deprecated'.

deprecate()

disable

Set the Module to 'Disabled'.

disable()

enable

Set the Module to 'Active'.

enable()

get

Get the Module by name or by ID; throws an exception if either is not provided.

static get(workspace, module_id=None, name=None, _workflow_provider=None)

Parameters

Name Description
workspace
Required

The workspace in which to create the Module.

module_id
str

The ID of the Module.

Default value: None
name
str

The name of the Module.

Default value: None
_workflow_provider
<xref:azureml.pipeline.core._aeva_provider._AevaWorkflowProvider>

(Internal use only.) The workflow provider.

Default value: None

Returns

Type Description

Module object

get_default

Get the default module version.

get_default()

Returns

Type Description

The default module version.

get_default_version

Get the default version of Module.

get_default_version()

Returns

Type Description
str

The default version of the Module.

get_versions

Get all the versions of the Module.

static get_versions(workspace, name, _workflow_provider=None)

Parameters

Name Description
workspace
Required

The workspace the Module was created on.

name
Required
str

The name of the Module.

_workflow_provider
<xref:azureml.pipeline.core._aeva_provider._AevaWorkflowProvider>

(Internal use only.) The workflow provider.

Default value: None

Returns

Type Description

The list of ModuleVersionDescriptor

module_def_builder

Create the module definition object that describes the step.

static module_def_builder(name, description, execution_type, input_bindings, output_bindings, param_defs=None, create_sequencing_ports=True, allow_reuse=True, version=None, module_type=None, step_type=None, arguments=None, runconfig=None, cloud_settings=None)

Parameters

Name Description
name
Required
str

The name the Module.

description
Required
str

The description of the Module.

execution_type
Required
str

The execution type of the Module.

input_bindings
Required

The Module input bindings.

output_bindings
Required

The Module output bindings.

param_defs

The Module param definitions.

Default value: None
create_sequencing_ports

Indicates whether sequencing ports will be created for the Module.

Default value: True
allow_reuse

Indicates whether he Module will be available to be reused.

Default value: True
version
str

The version of the Module.

Default value: None
module_type
str

The Module type.

Default value: None
step_type
str

Type of step associated with this module, e.g. "PythonScriptStep", "HyperDriveStep", etc.

Default value: None
arguments

Annotated arguments list to use when calling this module

Default value: None
runconfig
str

Runconfig that will be used for python_script_step

Default value: None
cloud_settings
str

Settings that will be used for clouds

Default value: None

Returns

Type Description

The Module def object.

Exceptions

Type Description

module_version_list

Get the Module version list.

module_version_list()

Returns

Type Description

The list of ModuleVersionDescriptor

process_source_directory

Process source directory for the step and check that the script exists.

static process_source_directory(name, source_directory, script_name)

Parameters

Name Description
name
Required
str

The name of the step.

source_directory
Required
str

The source directory for the step.

script_name
Required
str

The script name for the step.

Returns

Type Description

The source directory and hash paths.

Exceptions

Type Description

publish

Create a ModuleVersion and add it to the current Module.

publish(description, execution_type, inputs, outputs, param_defs=None, create_sequencing_ports=True, version=None, is_default=False, content_path=None, hash_paths=None, category=None, arguments=None, runconfig=None)

Parameters

Name Description
description
Required
str

The description of the Module.

execution_type
Required
str

The execution type of the Module. Acceptable values are esCloud, adlcloud and AzureBatchCloud

inputs
Required

The Module inputs.

outputs
Required

The Module outputs.

param_defs

The Module parameter definitions.

Default value: None
create_sequencing_ports

Indicates whether sequencing ports will be created for the Module.

Default value: True
version
str

The version of the Module.

Default value: None
is_default

Indicates whether the published version is to be the default one.

Default value: False
content_path
str

directory

Default value: None
hash_paths

A list of paths to hash when checking for changes to the step contents. If there are no changes detected, the pipeline will reuse the step contents from a previous run. By default, the contents of the source_directory are hashed (except files listed in .amlignore or .gitignore). DEPRECATED: no longer needed.

Default value: None
category
str

The module version's category

Default value: None
arguments

Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter).

Default value: None
runconfig

An optional RunConfiguration. A RunConfiguration can be used to specify additional requirements for the run, such as conda dependencies and a Docker image.

Default value: None

Returns

Type Description

Exceptions

Type Description

publish_adla_script

Create a ModuleVersion based on Azure Data Lake Analytics (ADLA) and add it to the current Module.

publish_adla_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, degree_of_parallelism=None, priority=None, runtime_version=None, compute_target=None, version=None, is_default=False, source_directory=None, hash_paths=None, category=None, arguments=None)

Parameters

Name Description
script_name
Required
str

The name of an ADLA script, relative to source_directory.

description
Required
str

The description of the Module version.

inputs
Required

The Module input bindings.

outputs
Required

The Module output bindings.

params

The ModuleVersion params, as name-default_value pairs.

Default value: None
create_sequencing_ports

Indicates whether sequencing ports will be created for the Module.

Default value: True
degree_of_parallelism
int

The degree of parallelism to use for this job.

Default value: None
priority
int

The priority value to use for the current job.

Default value: None
runtime_version
str

The runtime version of the Azure Data Lake Analytics (ADLA) engine.

Default value: None
compute_target

The ADLA compute to use for this job.

Default value: None
version
str

The version of the module.

Default value: None
is_default

Indicates whether the published version is to be the default one.

Default value: False
source_directory
str

directory

Default value: None
hash_paths

hash_paths

Default value: None
category
str

The module version's category

Default value: None
arguments

Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter).

Default value: None

Returns

Type Description

publish_azure_batch

Create a ModuleVersion that uses Azure batch and add it to the current Module.

publish_azure_batch(description, compute_target, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, create_pool=False, pool_id=None, delete_batch_job_after_finish=False, delete_batch_pool_after_finish=False, is_positive_exit_code_failure=True, vm_image_urn='urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter', run_task_as_admin=False, target_compute_nodes=1, vm_size='standard_d1_v2', executable=None, source_directory=None, category=None, arguments=None)

Parameters

Name Description
description
Required
str

The description of the Module version.

compute_target
Required

The BatchCompute compute target.

inputs
Required

The Module input bindings.

outputs
Required

The Module output bindings.

params

The ModuleVersion params, as name-default_value pairs.

Default value: None
create_sequencing_ports

Indicates whether sequencing ports will be created for the Module.

Default value: True
version
str

The version of the Module.

Default value: None
is_default

Indicates whether the published version is to be the default one.

Default value: False
create_pool

Indicates whether to create the pool before running the jobs.

Default value: False
pool_id
str

(Mandatory) The ID of the Pool where the job will run.

Default value: None
delete_batch_job_after_finish

Indicates whether to delete the job from Batch account after it's finished.

Default value: False
delete_batch_pool_after_finish

Indicates whether to delete the pool after the job finishes.

Default value: False
is_positive_exit_code_failure

Indicates whether he job fails if the task exists with a positive code.

Default value: True
vm_image_urn
str

If create_pool is True and VM uses VirtualMachineConfiguration, then this parameter indicates the VM image to use. Value format: urn:publisher:offer:sku. Example: urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter.

Default value: urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter
run_task_as_admin

Indicates whether the task should run with Admin privileges.

Default value: False
target_compute_nodes
int

If create_pool is True, indicates how many compute nodes will be added to the pool.

Default value: 1
vm_size
str

If create_pool is True, indicates the virtual machine size of the compute nodes.

Default value: standard_d1_v2
executable
str

The name of the command/executable that will be executed as part of the job.

Default value: None
source_directory
str

The source directory.

Default value: None
category
str

The module version's category

Default value: None
arguments

Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter).

Default value: None

Returns

Type Description

Exceptions

Type Description

publish_python_script

Create a ModuleVersion that's based on a Python script and add it to the current Module.

publish_python_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, source_directory=None, hash_paths=None, category=None, arguments=None, runconfig=None)

Parameters

Name Description
script_name
Required
str

The name of a Python script, relative to source_directory.

description
Required
str

The description of the Module version.

inputs
Required

The Module input bindings.

outputs
Required

The Module output bindings.

params

The ModuleVersion params, as name-default_value pairs.

Default value: None
create_sequencing_ports

Indicates whether sequencing ports will be created for the Module.

Default value: True
version
str

The version of the Module.

Default value: None
is_default

Indicates whether the published version is to be the default one.

Default value: False
source_directory
str

directory

Default value: None
hash_paths

A list of paths to hash when checking for changes to the step contents. If there are no changes detected, the pipeline will reuse the step contents from a previous run. By default the contents of the source_directory are hashed (except files listed in .amlignore or .gitignore). DEPRECATED: no longer needed.

Default value: None
category
str

The module version's category

Default value: None
arguments

Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter).

Default value: None
runconfig

An optional RunConfiguration. A RunConfiguration can be used to specify additional requirements for the run, such as conda dependencies and a Docker image.

Default value: None

Returns

Type Description

resolve

Resolve and return the right ModuleVersion.

resolve(version=None)

Parameters

Name Description
version
Default value: None

Returns

Type Description

The Module version to use.

set_default_version

Set the default ModuleVersion of the Module.

set_default_version(version_id)

Parameters

Name Description
version_id
Required

Returns

Type Description
str

The default version.

Exceptions

Type Description

set_description

Set the description of Module.

set_description(description)

Parameters

Name Description
description
Required
str

The description to set.

Exceptions

Type Description

set_name

Set the name of Module.

set_name(name)

Parameters

Name Description
name
Required
str

The name to set.

Exceptions

Type Description

Attributes

default_version

Get the default version of the Module.

Returns

Type Description
str

The default version string.

description

Get the description of the Module.

Returns

Type Description
str

The description string.

id

Get the ID of the Module.

Returns

Type Description
str

The id.

name

Get the name of the Module.

Returns

Type Description
str

The name.

status

Get the status of the Module.

Returns

Type Description
str

The status.