Estimator Class

Reference

Represents a generic estimator to train data using any supplied framework.

DEPRECATED. Use the ScriptRunConfig object with your own defined environment or an Azure ML curated environment. For an introduction to configuring experiment runs with ScriptRunConfig, see Configure and submit training runs.

This class is designed for use with machine learning frameworks that do not already have an Azure Machine Learning pre-configured estimator. Pre-configured estimators exist for Chainer, PyTorch, TensorFlow, and SKLearn. To create an Estimator that is not preconfigured, see Train models with Azure Machine Learning using estimator.

The Estimator class wraps run configuration information to help simplify the tasks of specifying how a script is executed. It supports single-node as well as multi-node execution. Running the estimator produces a model in the output directory specified in your training script.

Initialize the estimator.

azureml.core.environment._DEFAULT_SHM_SIZE is used. For more information, see Docker run reference. :type shm_size: str :param resume_from: The data path containing the checkpoint or model files from which to resume the experiment. :type resume_from: azureml.data.datapath.DataPath :param max_run_duration_seconds: The maximum allowed time for the run. Azure ML will attempt to automatically

cancel the run if it takes longer than this value.

Inheritance: azureml.train.estimator._mml_base_estimator.MMLBaseEstimator

Estimator

Constructor

Estimator(source_directory, *, compute_target=None, vm_size=None, vm_priority=None, entry_script=None, script_params=None, node_count=1, process_count_per_node=1, distributed_backend=None, distributed_training=None, use_gpu=False, use_docker=True, custom_docker_base_image=None, custom_docker_image=None, image_registry_details=None, user_managed=False, conda_packages=None, pip_packages=None, conda_dependencies_file_path=None, pip_requirements_file_path=None, conda_dependencies_file=None, pip_requirements_file=None, environment_variables=None, environment_definition=None, inputs=None, source_directory_data_store=None, shm_size=None, resume_from=None, max_run_duration_seconds=None, _disable_validation=True, _show_lint_warnings=False, _show_package_warnings=False)

Parameters

Name	Description
source_directory Required	str A local directory containing experiment configuration and code files needed for a training job.
compute_target Required	AbstractComputeTarget or str The compute target where training will happen. This can either be an object or the string "local".
vm_size Required	str The VM size of the compute target that will be created for the training. Supported values: Any Azure VM size.
vm_priority Required	str The VM priority of the compute target that will be created for the training. If not specified, 'dedicated' is used. Supported values: 'dedicated' and 'lowpriority'. This takes effect only when the `vm_size` parameter is specified in the input.
entry_script Required	str The relative path to the file used to start training.
script_params Required	dict A dictionary of command-line arguments to pass to the training script specified in `entry_script`.
node_count Required	int The number of nodes in the compute target used for training. If greater than 1, an MPI distributed job will be run.
process_count_per_node Required	int The number of processes (or "workers") to run on each node. If greater than 1, an MPI distributed job will be run. Only the AmlCompute target is supported for distributed jobs.
distributed_backend Required	str The communication backend for distributed training. DEPRECATED. Use the `distributed_training` parameter. Supported values: 'mpi'. 'mpi' represents MPI/Horovod. This parameter is required when `node_count` or `process_count_per_node` > 1. When `node_count` == 1 and `process_count_per_node` == 1, no backend will be used unless the backend is explicitly set. Only the AmlCompute target is supported for distributed training.
distributed_training Required	Mpi Parameters for running a distributed training job. For running a distributed job with MPI backend, use Mpi object to specify `process_count_per_node`.
use_gpu Required	bool Indicates whether the environment to run the experiment should support GPUs. If true, a GPU-based default Docker image will be used in the environment. If false, a CPU-based image will be used. Default Docker images (CPU or GPU) will be used only if the `custom_docker_image` parameter is not set. This setting is used only in Docker enabled compute targets.
use_docker Required	bool Specifies whether the environment to run the experiment should be Docker-based.
custom_docker_base_image Required	str The name of the Docker image from which the image to use for training will be built. DEPRECATED. Use the `custom_docker_image` parameter. If not set, a default CPU-based image will be used as the base image.
custom_docker_image Required	str The name of the Docker image from which the image to use for training will be built. If not set, a default CPU-based image will be used as the base image. Only specify images available in public docker repositories (Docker Hub). To use an image from a private docker repository, use the constructor's `environment_definition` parameter instead.
image_registry_details Required	ContainerRegistry The details of the Docker image registry.
user_managed Required	bool Specifies whether Azure ML reuses an existing Python environment. If false, a Python environment is created based on the conda dependencies specification.
conda_packages Required	list A list of strings representing conda packages to be added to the Python environment for the experiment.
pip_packages Required	list A list of strings representing pip packages to be added to the Python environment for the experiment.
conda_dependencies_file_path Required	str The relative path to the conda dependencies yaml file. If specified, Azure ML will not install any framework related packages. DEPRECATED. Use the `conda_dependencies_file` paramenter. Specify either `conda_dependencies_file_path` or `conda_dependencies_file`. If both are specified, `conda_dependencies_file` is used.
pip_requirements_file_path Required	str The relative path to the pip requirements text file. DEPRECATED. Use the `pip_requirements_file` parameter. This parameter can be specified in combination with the `pip_packages` parameter. Specify either `pip_requirements_file_path` or `pip_requirements_file`. If both are specified, `pip_requirements_file` is used.
conda_dependencies_file Required	str The relative path to the conda dependencies yaml file. If specified, Azure ML will not install any framework related packages.
pip_requirements_file Required	str The relative path to the pip requirements text file. This parameter can be specified in combination with the `pip_packages` parameter.
environment_variables Required	dict A dictionary of environment variables names and values. These environment variables are set on the process where user script is being executed.
environment_definition Required	Environment The environment definition for the experiment. It includes PythonSection, DockerSection, and environment variables. Any environment option not directly exposed through other parameters to the Estimator construction can be set using this parameter. If this parameter is specified, it will take precedence over other environment-related parameters like `use_gpu`, `custom_docker_image`, `conda_packages`, or `pip_packages`. Errors will be reported on invalid combinations.
inputs Required	list A list of DataReference or DatasetConsumptionConfig objects to use as input.
source_directory_data_store Required	Datastore The backing data store for the project share.
shm_size Required	str The size of the Docker container's shared memory block. If not set, the default azureml.core.environment._DEFAULT_SHM_SIZE is used. For more information, see Docker run reference.
resume_from Required	DataPath The data path containing the checkpoint or model files from which to resume the experiment.
max_run_duration_seconds Required	int The maximum allowed time for the run. Azure ML will attempt to automatically cancel the run if it take longer than this value.
source_directory Required	str A local directory containing experiment configuration and code files needed for a training job.
compute_target Required	AbstractComputeTarget or str The compute target where training will happen. This can either be an object or the string "local".
vm_size Required	str The VM size of the compute target that will be created for the training. Supported values: Any Azure VM size.
vm_priority Required	str The VM priority of the compute target that will be created for the training. If not specified, 'dedicated' is used. Supported values: 'dedicated' and 'lowpriority'. This takes effect only when the `vm_size` parameter is specified in the input.
entry_script Required	str The relative path to the file used to start training.
script_params Required	dict A dictionary of command-line arguments to pass to the training script specified in `entry_script`.
node_count Required	int The number of nodes in the compute target used for training. If greater than 1, a MPI distributed job will be run. Only the AmlCompute target is supported for distributed jobs.
process_count_per_node Required	int The number of processes per node. If greater than 1, a MPI distributed job will be run. Only the AmlCompute target is supported for distributed jobs.
distributed_backend Required	str The communication backend for distributed training. DEPRECATED. Use the `distributed_training` parameter. Supported values: 'mpi'. 'mpi' represents MPI/Horovod. This parameter is required when `node_count` or `process_count_per_node` > 1. When `node_count` == 1 and `process_count_per_node` == 1, no backend will be used unless the backend is explicitly set. Only the AmlCompute target is supported for distributed training.
distributed_training Required	Mpi Parameters for running a distributed training job. For running a distributed job with MPI backend, use Mpi object to specify `process_count_per_node`.
use_gpu Required	bool Specifies whether the environment to run the experiment should support GPUs. If true, a GPU-based default Docker image will be used in the environment. If false, a CPU-based image will be used. Default Docker images (CPU or GPU) will be used only if the `custom_docker_image` parameter is not set. This setting is used only in Docker-enabled compute targets.
use_docker Required	bool Specifies whether the environment to run the experiment should be Docker-based.
custom_docker_base_image Required	str The name of the Docker image from which the image to use for training will be built. DEPRECATED. Use the `custom_docker_image` parameter. If not set, a default CPU-based image will be used as the base image.
custom_docker_image Required	str The name of the Docker image from which the image to use for training will be built. If not set, a default CPU-based image will be used as the base image. Only specify images available in public docker repositories (Docker Hub). To use an image from a private docker repository, use the constructor's `environment_definition` parameter instead.
image_registry_details Required	ContainerRegistry The details of the Docker image registry.
user_managed Required	bool Specifies whether Azure ML reuses an existing Python environment. If false, a Python environment is created based on the conda dependencies specification.
conda_packages Required	list A list of strings representing conda packages to be added to the Python environment for the experiment.
pip_packages Required	list A list of strings representing pip packages to be added to the Python environment for the experiment.
conda_dependencies_file_path Required	The relative path to the conda dependencies yaml file. If specified, Azure ML will not install any framework related packages. DEPRECATED. Use the `conda_dependencies_file` paramenter. Specify either `conda_dependencies_file_path` or `conda_dependencies_file`. If both are specified, `conda_dependencies_file` is used.
pip_requirements_file_path Required	The relative path to the pip requirements text file. DEPRECATED. Use the `pip_requirements_file` parameter. This can be provided in combination with the `pip_packages` parameter. Specify either `pip_requirements_file_path` or `pip_requirements_file`. If both are specified, `pip_requirements_file` is used.
pip_requirements_file Required	str The relative path to the pip requirements text file. This can be provided in combination with the `pip_packages` parameter.
environment_variables Required	dict A dictionary of environment variables names and values. These environment variables are set on the process where user script is being executed.
environment_definition Required	Environment The environment definition for the experiment. It includes PythonSection, DockerSection, and environment variables. Any environment option not directly exposed through other parameters to the Estimator construction can be set using this parameter. If this parameter is specified, it will take precedence over other environment-related parameters like `use_gpu`, `custom_docker_image`, `conda_packages`, or `pip_packages`. Errors will be reported on invalid combinations.
inputs Required	list A list of DataReference or DatasetConsumptionConfig objects to use as input.
source_directory_data_store Required	Datastore The backing data store for the project share.
shm_size Required	The size of the Docker container's shared memory block. If not set, the default
_disable_validation Required	bool Disable script validation before run submission. The default is True.
_show_lint_warnings Required	bool Show script linting warnings. The default is False.
_show_package_warnings Required	bool Show package validation warnings. The default is False.

Share via

Estimator Class

Constructor

Parameters

Feedback

Additional resources