Module Class
Represents a computation unit used in an Azure Machine Learning pipeline.
A module is a collection of files which will run on a compute target and a description of an interface. The collection of files can be script, binaries, or any other files required to execute on the compute target. The module interface describes inputs, outputs, and parameter definitions. It doesn't bind them to specific values or data. A module has a snapshot associated with it, which captures the collection of files defined for the module.
Initialize Module.
- Inheritance
-
builtins.objectModule
Constructor
Module(workspace, module_id, name, description, status, default_version, module_version_list, _module_provider=None, _module_version_provider=None)
Parameters
- _module_provider
- <xref:azureml.pipeline.core._aeva_provider._AzureMLModuleProvider>
(Internal use only.) The Module provider.
- _module_version_provider
- <xref:azureml.pipeline.core._aeva_provider._AevaMlModuleVersionProvider>
(Internal use only.) The ModuleVersion provider.
- _module_provider
- <xref:<xref:_AevaMlModuleProvider object>>
The Module provider.
- _module_version_provider
- <xref:azureml.pipeline.core._aeva_provider._AevaMlModuleVersionProvider>
The ModuleVersion provider.
Remarks
A Module acts as a container of its versions. In the following example, a ModuleVersion is created
from the publish_python_script method and has
two inputs and two outputs. The create ModuleVersion is the default version (is_default
is set to True).
out_sum = OutputPortDef(name="out_sum", default_datastore_name=datastore.name, default_datastore_mode="mount",
label="Sum of two numbers")
out_prod = OutputPortDef(name="out_prod", default_datastore_name=datastore.name, default_datastore_mode="mount",
label="Product of two numbers")
entry_version = module.publish_python_script("calculate.py", "initial",
inputs=[], outputs=[out_sum, out_prod], params = {"initialNum":12},
version="1", source_directory="./calc")
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb
This module can be used when defining a pipeline, in different steps, by using a ModuleStep.
The following sample shows how to wire the data used in the pipeline to inputs and outputs of a ModuleVersion using PipelineData:
middle_step_input_wiring = {"in1":first_sum, "in2":first_prod}
middle_sum = PipelineData("middle_sum", datastore=datastore, output_mode="mount",is_directory=False)
middle_prod = PipelineData("middle_prod", datastore=datastore, output_mode="mount",is_directory=False)
middle_step_output_wiring = {"out_sum":middle_sum, "out_prod":middle_prod}
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb
The mapping can then be used when creating the ModuleStep:
middle_step = ModuleStep(module=module,
inputs_map= middle_step_input_wiring,
outputs_map= middle_step_output_wiring,
runconfig=RunConfiguration(), compute_target=aml_compute,
arguments = ["--file_num1", first_sum, "--file_num2", first_prod,
"--output_sum", middle_sum, "--output_product", middle_prod])
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb
The resolution of which version of the module to use happens upon submission, and follows the following process:
- Remove all disabled versions
- If a specific version was stated, use that, else
- If a default version was defined to the Module, use that, else
- If all versions follow semantic versioning without letters, take the highest value, else
- Take the version of the Module that was updated last
Note that because a node's inputs and outputs mapping to a module's input and output is defined upon Pipeline creation, if the resolved version upon submission has a different interface from the one that is resolved upon pipeline creation, then the pipeline submission will fail.
The underlying module can be updated with new versions while keeping the default version the same.
Modules are uniquely named within a workspace.
Methods
create |
Create the Module. |
deprecate |
Set the Module to 'Deprecated'. |
disable |
Set the Module to 'Disabled'. |
enable |
Set the Module to 'Active'. |
get |
Get the Module by name or by ID; throws an exception if either is not provided. |
get_default |
Get the default module version. |
get_default_version |
Get the default version of Module. |
get_versions |
Get all the versions of the Module. |
module_def_builder |
Create the module definition object that describes the step. |
module_version_list |
Get the Module version list. |
process_source_directory |
Process source directory for the step and check that the script exists. |
publish |
Create a ModuleVersion and add it to the current Module. |
publish_adla_script |
Create a ModuleVersion based on Azure Data Lake Analytics (ADLA) and add it to the current Module. |
publish_azure_batch |
Create a ModuleVersion that uses Azure batch and add it to the current Module. |
publish_python_script |
Create a ModuleVersion that's based on a Python script and add it to the current Module. |
resolve |
Resolve and return the right ModuleVersion. |
set_default_version |
Set the default ModuleVersion of the Module. |
set_description |
Set the description of Module. |
set_name |
Set the name of Module. |
create
Create the Module.
static create(workspace, name, description, _workflow_provider=None)
Parameters
- _workflow_provider
- <xref:azureml.pipeline.core._aeva_provider._AevaWorkflowProvider>
(Internal use only.) The workflow provider.
Returns
Module object
Return type
deprecate
Set the Module to 'Deprecated'.
deprecate()
disable
Set the Module to 'Disabled'.
disable()
enable
Set the Module to 'Active'.
enable()
get
Get the Module by name or by ID; throws an exception if either is not provided.
static get(workspace, module_id=None, name=None, _workflow_provider=None)
Parameters
- _workflow_provider
- <xref:azureml.pipeline.core._aeva_provider._AevaWorkflowProvider>
(Internal use only.) The workflow provider.
Returns
Module object
Return type
get_default
Get the default module version.
get_default()
Returns
The default module version.
Return type
get_default_version
Get the default version of Module.
get_default_version()
Returns
The default version of the Module.
Return type
get_versions
Get all the versions of the Module.
static get_versions(workspace, name, _workflow_provider=None)
Parameters
- _workflow_provider
- <xref:azureml.pipeline.core._aeva_provider._AevaWorkflowProvider>
(Internal use only.) The workflow provider.
Returns
The list of ModuleVersionDescriptor
Return type
module_def_builder
Create the module definition object that describes the step.
static module_def_builder(name, description, execution_type, input_bindings, output_bindings, param_defs=None, create_sequencing_ports=True, allow_reuse=True, version=None, module_type=None, step_type=None, arguments=None, runconfig=None, cloud_settings=None)
Parameters
- create_sequencing_ports
- bool
Indicates whether sequencing ports will be created for the Module.
- step_type
- str
Type of step associated with this module, e.g. "PythonScriptStep", "HyperDriveStep", etc.
Returns
The Module def object.
Return type
Exceptions
module_version_list
Get the Module version list.
module_version_list()
Returns
The list of ModuleVersionDescriptor
Return type
process_source_directory
Process source directory for the step and check that the script exists.
static process_source_directory(name, source_directory, script_name)
Parameters
Returns
The source directory and hash paths.
Return type
Exceptions
publish
Create a ModuleVersion and add it to the current Module.
publish(description, execution_type, inputs, outputs, param_defs=None, create_sequencing_ports=True, version=None, is_default=False, content_path=None, hash_paths=None, category=None, arguments=None, runconfig=None)
Parameters
- execution_type
- str
The execution type of the Module.
Acceptable values are esCloud
, adlcloud
and AzureBatchCloud
- create_sequencing_ports
- bool
Indicates whether sequencing ports will be created for the Module.
- is_default
- bool
Indicates whether the published version is to be the default one.
- hash_paths
- list
A list of paths to hash when checking for changes to the step contents. If there
are no changes detected, the pipeline will reuse the step contents from a previous run. By default, the
contents of the source_directory
are hashed (except files listed in .amlignore or .gitignore).
DEPRECATED: no longer needed.
- arguments
- list
Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter).
- runconfig
- RunConfiguration
An optional RunConfiguration. A RunConfiguration can be used to specify additional requirements for the run, such as conda dependencies and a Docker image.
Return type
Exceptions
publish_adla_script
Create a ModuleVersion based on Azure Data Lake Analytics (ADLA) and add it to the current Module.
publish_adla_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, degree_of_parallelism=None, priority=None, runtime_version=None, compute_target=None, version=None, is_default=False, source_directory=None, hash_paths=None, category=None, arguments=None)
Parameters
- create_sequencing_ports
- bool
Indicates whether sequencing ports will be created for the Module.
- runtime_version
- str
The runtime version of the Azure Data Lake Analytics (ADLA) engine.
- is_default
- bool
Indicates whether the published version is to be the default one.
- arguments
- list
Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter).
Return type
publish_azure_batch
Create a ModuleVersion that uses Azure batch and add it to the current Module.
publish_azure_batch(description, compute_target, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, create_pool=False, pool_id=None, delete_batch_job_after_finish=False, delete_batch_pool_after_finish=False, is_positive_exit_code_failure=True, vm_image_urn='urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter', run_task_as_admin=False, target_compute_nodes=1, vm_size='standard_d1_v2', executable=None, source_directory=None, category=None, arguments=None)
Parameters
- create_sequencing_ports
- bool
Indicates whether sequencing ports will be created for the Module.
- is_default
- bool
Indicates whether the published version is to be the default one.
- delete_batch_job_after_finish
- bool
Indicates whether to delete the job from Batch account after it's finished.
- delete_batch_pool_after_finish
- bool
Indicates whether to delete the pool after the job finishes.
- is_positive_exit_code_failure
- bool
Indicates whether he job fails if the task exists with a positive code.
- vm_image_urn
- str
If create_pool
is True and VM uses VirtualMachineConfiguration, then this
parameter indicates the VM image to use. Value format: urn:publisher:offer:sku
.
Example: urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter
.
- run_task_as_admin
- bool
Indicates whether the task should run with Admin privileges.
- target_compute_nodes
- int
If create_pool
is True, indicates how many compute nodes will be added
to the pool.
- vm_size
- str
If create_pool
is True, indicates the virtual machine size of the compute nodes.
- executable
- str
The name of the command/executable that will be executed as part of the job.
- arguments
- list
Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter).
Return type
Exceptions
publish_python_script
Create a ModuleVersion that's based on a Python script and add it to the current Module.
publish_python_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, source_directory=None, hash_paths=None, category=None, arguments=None, runconfig=None)
Parameters
- create_sequencing_ports
- bool
Indicates whether sequencing ports will be created for the Module.
- is_default
- bool
Indicates whether the published version is to be the default one.
- hash_paths
- list
A list of paths to hash when checking for changes to the step contents. If there
are no changes detected, the pipeline will reuse the step contents from a previous run. By default the
contents of the source_directory
are hashed (except files listed in .amlignore or .gitignore).
DEPRECATED: no longer needed.
- arguments
- list
Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter).
- runconfig
- RunConfiguration
An optional RunConfiguration. A RunConfiguration can be used to specify additional requirements for the run, such as conda dependencies and a Docker image.
Return type
resolve
Resolve and return the right ModuleVersion.
resolve(version=None)
Parameters
- version
Returns
The Module version to use.
Return type
set_default_version
Set the default ModuleVersion of the Module.
set_default_version(version_id)
Parameters
- version_id
Returns
The default version.
Return type
Exceptions
set_description
Set the description of Module.
set_description(description)
Parameters
Exceptions
set_name
Set the name of Module.
set_name(name)
Parameters
Exceptions
Attributes
default_version
description
id
name
status
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for