Module Class
Represents a computation unit used in an Azure Machine Learning pipeline.
A module is a collection of files which will run on a compute target and a description of an interface. The collection of files can be script, binaries, or any other files required to execute on the compute target. The module interface describes inputs, outputs, and parameter definitions. It doesn't bind them to specific values or data. A module has a snapshot associated with it, which captures the collection of files defined for the module.
Initialize Module.
- Inheritance
-
builtins.objectModule
Constructor
Module(workspace, module_id, name, description, status, default_version, module_version_list, _module_provider=None, _module_version_provider=None)
Parameters
Name | Description |
---|---|
workspace
Required
|
The workspace object this Module belongs to. |
module_id
Required
|
The ID of the Module. |
name
Required
|
The name of the Module. |
description
Required
|
The description of the Module. |
status
Required
|
The new status of the Module: 'Active', 'Deprecated', or 'Disabled'. |
default_version
Required
|
The default version of the Module. |
module_version_list
Required
|
A list of ModuleVersionDescriptor objects. |
_module_provider
|
<xref:azureml.pipeline.core._aeva_provider._AzureMLModuleProvider>
(Internal use only.) The Module provider. Default value: None
|
_module_version_provider
|
<xref:azureml.pipeline.core._aeva_provider._AevaMlModuleVersionProvider>
(Internal use only.) The ModuleVersion provider. Default value: None
|
workspace
Required
|
The workspace object this Module belongs to. |
module_id
Required
|
The ID of the Module. |
name
Required
|
The name of the Module. |
description
Required
|
The description of the Module. |
status
Required
|
The new status of the Module: 'Active', 'Deprecated', or 'Disabled'. |
default_version
Required
|
The default version of the Module. |
module_version_list
Required
|
A list of ModuleVersionDescriptor objects. |
_module_provider
Required
|
<xref:<xref:_AevaMlModuleProvider object>>
The Module provider. |
_module_version_provider
Required
|
<xref:azureml.pipeline.core._aeva_provider._AevaMlModuleVersionProvider>
The ModuleVersion provider. |
Remarks
A Module acts as a container of its versions. In the following example, a ModuleVersion is created
from the publish_python_script method and has
two inputs and two outputs. The create ModuleVersion is the default version (is_default
is set to True).
out_sum = OutputPortDef(name="out_sum", default_datastore_name=datastore.name, default_datastore_mode="mount",
label="Sum of two numbers")
out_prod = OutputPortDef(name="out_prod", default_datastore_name=datastore.name, default_datastore_mode="mount",
label="Product of two numbers")
entry_version = module.publish_python_script("calculate.py", "initial",
inputs=[], outputs=[out_sum, out_prod], params = {"initialNum":12},
version="1", source_directory="./calc")
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb
This module can be used when defining a pipeline, in different steps, by using a ModuleStep.
The following sample shows how to wire the data used in the pipeline to inputs and outputs of a ModuleVersion using PipelineData:
middle_step_input_wiring = {"in1":first_sum, "in2":first_prod}
middle_sum = PipelineData("middle_sum", datastore=datastore, output_mode="mount",is_directory=False)
middle_prod = PipelineData("middle_prod", datastore=datastore, output_mode="mount",is_directory=False)
middle_step_output_wiring = {"out_sum":middle_sum, "out_prod":middle_prod}
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb
The mapping can then be used when creating the ModuleStep:
middle_step = ModuleStep(module=module,
inputs_map= middle_step_input_wiring,
outputs_map= middle_step_output_wiring,
runconfig=RunConfiguration(), compute_target=aml_compute,
arguments = ["--file_num1", first_sum, "--file_num2", first_prod,
"--output_sum", middle_sum, "--output_product", middle_prod])
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb
The resolution of which version of the module to use happens upon submission, and follows the following process:
- Remove all disabled versions
- If a specific version was stated, use that, else
- If a default version was defined to the Module, use that, else
- If all versions follow semantic versioning without letters, take the highest value, else
- Take the version of the Module that was updated last
Note that because a node's inputs and outputs mapping to a module's input and output is defined upon Pipeline creation, if the resolved version upon submission has a different interface from the one that is resolved upon pipeline creation, then the pipeline submission will fail.
The underlying module can be updated with new versions while keeping the default version the same.
Modules are uniquely named within a workspace.
Methods
create |
Create the Module. |
deprecate |
Set the Module to 'Deprecated'. |
disable |
Set the Module to 'Disabled'. |
enable |
Set the Module to 'Active'. |
get |
Get the Module by name or by ID; throws an exception if either is not provided. |
get_default |
Get the default module version. |
get_default_version |
Get the default version of Module. |
get_versions |
Get all the versions of the Module. |
module_def_builder |
Create the module definition object that describes the step. |
module_version_list |
Get the Module version list. |
process_source_directory |
Process source directory for the step and check that the script exists. |
publish |
Create a ModuleVersion and add it to the current Module. |
publish_adla_script |
Create a ModuleVersion based on Azure Data Lake Analytics (ADLA) and add it to the current Module. |
publish_azure_batch |
Create a ModuleVersion that uses Azure batch and add it to the current Module. |
publish_python_script |
Create a ModuleVersion that's based on a Python script and add it to the current Module. |
resolve |
Resolve and return the right ModuleVersion. |
set_default_version |
Set the default ModuleVersion of the Module. |
set_description |
Set the description of Module. |
set_name |
Set the name of Module. |
create
Create the Module.
static create(workspace, name, description, _workflow_provider=None)
Parameters
Name | Description |
---|---|
workspace
Required
|
The workspace in which to create the Module. |
name
Required
|
The name of the Module. |
description
Required
|
The description of the Module. |
_workflow_provider
|
<xref:azureml.pipeline.core._aeva_provider._AevaWorkflowProvider>
(Internal use only.) The workflow provider. Default value: None
|
Returns
Type | Description |
---|---|
Module object |
deprecate
Set the Module to 'Deprecated'.
deprecate()
disable
Set the Module to 'Disabled'.
disable()
enable
Set the Module to 'Active'.
enable()
get
Get the Module by name or by ID; throws an exception if either is not provided.
static get(workspace, module_id=None, name=None, _workflow_provider=None)
Parameters
Name | Description |
---|---|
workspace
Required
|
The workspace in which to create the Module. |
module_id
|
The ID of the Module. Default value: None
|
name
|
The name of the Module. Default value: None
|
_workflow_provider
|
<xref:azureml.pipeline.core._aeva_provider._AevaWorkflowProvider>
(Internal use only.) The workflow provider. Default value: None
|
Returns
Type | Description |
---|---|
Module object |
get_default
Get the default module version.
get_default()
Returns
Type | Description |
---|---|
The default module version. |
get_default_version
Get the default version of Module.
get_default_version()
Returns
Type | Description |
---|---|
The default version of the Module. |
get_versions
Get all the versions of the Module.
static get_versions(workspace, name, _workflow_provider=None)
Parameters
Name | Description |
---|---|
workspace
Required
|
The workspace the Module was created on. |
name
Required
|
The name of the Module. |
_workflow_provider
|
<xref:azureml.pipeline.core._aeva_provider._AevaWorkflowProvider>
(Internal use only.) The workflow provider. Default value: None
|
Returns
Type | Description |
---|---|
The list of ModuleVersionDescriptor |
module_def_builder
Create the module definition object that describes the step.
static module_def_builder(name, description, execution_type, input_bindings, output_bindings, param_defs=None, create_sequencing_ports=True, allow_reuse=True, version=None, module_type=None, step_type=None, arguments=None, runconfig=None, cloud_settings=None)
Parameters
Name | Description |
---|---|
name
Required
|
The name the Module. |
description
Required
|
The description of the Module. |
execution_type
Required
|
The execution type of the Module. |
input_bindings
Required
|
The Module input bindings. |
output_bindings
Required
|
The Module output bindings. |
param_defs
|
The Module param definitions. Default value: None
|
create_sequencing_ports
|
Indicates whether sequencing ports will be created for the Module. Default value: True
|
allow_reuse
|
Indicates whether he Module will be available to be reused. Default value: True
|
version
|
The version of the Module. Default value: None
|
module_type
|
The Module type. Default value: None
|
step_type
|
Type of step associated with this module, e.g. "PythonScriptStep", "HyperDriveStep", etc. Default value: None
|
arguments
|
Annotated arguments list to use when calling this module Default value: None
|
runconfig
|
Runconfig that will be used for python_script_step Default value: None
|
cloud_settings
|
Settings that will be used for clouds Default value: None
|
Returns
Type | Description |
---|---|
The Module def object. |
Exceptions
Type | Description |
---|---|
module_version_list
Get the Module version list.
module_version_list()
Returns
Type | Description |
---|---|
The list of ModuleVersionDescriptor |
process_source_directory
Process source directory for the step and check that the script exists.
static process_source_directory(name, source_directory, script_name)
Parameters
Name | Description |
---|---|
name
Required
|
The name of the step. |
source_directory
Required
|
The source directory for the step. |
script_name
Required
|
The script name for the step. |
Returns
Type | Description |
---|---|
The source directory and hash paths. |
Exceptions
Type | Description |
---|---|
publish
Create a ModuleVersion and add it to the current Module.
publish(description, execution_type, inputs, outputs, param_defs=None, create_sequencing_ports=True, version=None, is_default=False, content_path=None, hash_paths=None, category=None, arguments=None, runconfig=None)
Parameters
Name | Description |
---|---|
description
Required
|
The description of the Module. |
execution_type
Required
|
The execution type of the Module.
Acceptable values are |
inputs
Required
|
The Module inputs. |
outputs
Required
|
The Module outputs. |
param_defs
|
The Module parameter definitions. Default value: None
|
create_sequencing_ports
|
Indicates whether sequencing ports will be created for the Module. Default value: True
|
version
|
The version of the Module. Default value: None
|
is_default
|
Indicates whether the published version is to be the default one. Default value: False
|
content_path
|
directory Default value: None
|
hash_paths
|
A list of paths to hash when checking for changes to the step contents. If there
are no changes detected, the pipeline will reuse the step contents from a previous run. By default, the
contents of the Default value: None
|
category
|
The module version's category Default value: None
|
arguments
|
Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter). Default value: None
|
runconfig
|
An optional RunConfiguration. A RunConfiguration can be used to specify additional requirements for the run, such as conda dependencies and a Docker image. Default value: None
|
Returns
Type | Description |
---|---|
Exceptions
Type | Description |
---|---|
publish_adla_script
Create a ModuleVersion based on Azure Data Lake Analytics (ADLA) and add it to the current Module.
publish_adla_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, degree_of_parallelism=None, priority=None, runtime_version=None, compute_target=None, version=None, is_default=False, source_directory=None, hash_paths=None, category=None, arguments=None)
Parameters
Name | Description |
---|---|
script_name
Required
|
The name of an ADLA script, relative to |
description
Required
|
The description of the Module version. |
inputs
Required
|
The Module input bindings. |
outputs
Required
|
The Module output bindings. |
params
|
The ModuleVersion params, as name-default_value pairs. Default value: None
|
create_sequencing_ports
|
Indicates whether sequencing ports will be created for the Module. Default value: True
|
degree_of_parallelism
|
The degree of parallelism to use for this job. Default value: None
|
priority
|
The priority value to use for the current job. Default value: None
|
runtime_version
|
The runtime version of the Azure Data Lake Analytics (ADLA) engine. Default value: None
|
compute_target
|
The ADLA compute to use for this job. Default value: None
|
version
|
The version of the module. Default value: None
|
is_default
|
Indicates whether the published version is to be the default one. Default value: False
|
source_directory
|
directory Default value: None
|
hash_paths
|
hash_paths Default value: None
|
category
|
The module version's category Default value: None
|
arguments
|
Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter). Default value: None
|
Returns
Type | Description |
---|---|
publish_azure_batch
Create a ModuleVersion that uses Azure batch and add it to the current Module.
publish_azure_batch(description, compute_target, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, create_pool=False, pool_id=None, delete_batch_job_after_finish=False, delete_batch_pool_after_finish=False, is_positive_exit_code_failure=True, vm_image_urn='urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter', run_task_as_admin=False, target_compute_nodes=1, vm_size='standard_d1_v2', executable=None, source_directory=None, category=None, arguments=None)
Parameters
Name | Description |
---|---|
description
Required
|
The description of the Module version. |
compute_target
Required
|
BatchCompute or
str
The BatchCompute compute target. |
inputs
Required
|
The Module input bindings. |
outputs
Required
|
The Module output bindings. |
params
|
The ModuleVersion params, as name-default_value pairs. Default value: None
|
create_sequencing_ports
|
Indicates whether sequencing ports will be created for the Module. Default value: True
|
version
|
The version of the Module. Default value: None
|
is_default
|
Indicates whether the published version is to be the default one. Default value: False
|
create_pool
|
Indicates whether to create the pool before running the jobs. Default value: False
|
pool_id
|
(Mandatory) The ID of the Pool where the job will run. Default value: None
|
delete_batch_job_after_finish
|
Indicates whether to delete the job from Batch account after it's finished. Default value: False
|
delete_batch_pool_after_finish
|
Indicates whether to delete the pool after the job finishes. Default value: False
|
is_positive_exit_code_failure
|
Indicates whether he job fails if the task exists with a positive code. Default value: True
|
vm_image_urn
|
If Default value: urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter
|
run_task_as_admin
|
Indicates whether the task should run with Admin privileges. Default value: False
|
target_compute_nodes
|
If Default value: 1
|
vm_size
|
If Default value: standard_d1_v2
|
executable
|
The name of the command/executable that will be executed as part of the job. Default value: None
|
source_directory
|
The source directory. Default value: None
|
category
|
The module version's category Default value: None
|
arguments
|
Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter). Default value: None
|
Returns
Type | Description |
---|---|
Exceptions
Type | Description |
---|---|
publish_python_script
Create a ModuleVersion that's based on a Python script and add it to the current Module.
publish_python_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, source_directory=None, hash_paths=None, category=None, arguments=None, runconfig=None)
Parameters
Name | Description |
---|---|
script_name
Required
|
The name of a Python script, relative to |
description
Required
|
The description of the Module version. |
inputs
Required
|
The Module input bindings. |
outputs
Required
|
The Module output bindings. |
params
|
The ModuleVersion params, as name-default_value pairs. Default value: None
|
create_sequencing_ports
|
Indicates whether sequencing ports will be created for the Module. Default value: True
|
version
|
The version of the Module. Default value: None
|
is_default
|
Indicates whether the published version is to be the default one. Default value: False
|
source_directory
|
directory Default value: None
|
hash_paths
|
A list of paths to hash when checking for changes to the step contents. If there
are no changes detected, the pipeline will reuse the step contents from a previous run. By default the
contents of the Default value: None
|
category
|
The module version's category Default value: None
|
arguments
|
Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter). Default value: None
|
runconfig
|
An optional RunConfiguration. A RunConfiguration can be used to specify additional requirements for the run, such as conda dependencies and a Docker image. Default value: None
|
Returns
Type | Description |
---|---|
resolve
Resolve and return the right ModuleVersion.
resolve(version=None)
Parameters
Name | Description |
---|---|
version
|
Default value: None
|
Returns
Type | Description |
---|---|
The Module version to use. |
set_default_version
Set the default ModuleVersion of the Module.
set_default_version(version_id)
Parameters
Name | Description |
---|---|
version_id
Required
|
|
Returns
Type | Description |
---|---|
The default version. |
Exceptions
Type | Description |
---|---|
set_description
Set the description of Module.
set_description(description)
Parameters
Name | Description |
---|---|
description
Required
|
The description to set. |
Exceptions
Type | Description |
---|---|
set_name
Set the name of Module.
set_name(name)
Parameters
Name | Description |
---|---|
name
Required
|
The name to set. |
Exceptions
Type | Description |
---|---|