SKLearn Class
Creates an estimator for training in Scikit-learn experiments.
DEPRECATED. Use the ScriptRunConfig object with your own defined environment or the AzureML-Tutorial curated environment. For an introduction to configuring SKLearn experiment runs with ScriptRunConfig, see Train scikit-learn models at scale with Azure Machine Learning.
This estimator only supports single-node CPU training.
Supported versions: 0.20.3
Initialize a Scikit-learn estimator.
- Inheritance
-
azureml.train.estimator._framework_base_estimator._FrameworkBaseEstimatorSKLearn
Constructor
SKLearn(source_directory, *, compute_target=None, vm_size=None, vm_priority=None, entry_script=None, script_params=None, use_docker=True, custom_docker_image=None, image_registry_details=None, user_managed=False, conda_packages=None, pip_packages=None, conda_dependencies_file_path=None, pip_requirements_file_path=None, conda_dependencies_file=None, pip_requirements_file=None, environment_variables=None, environment_definition=None, inputs=None, shm_size=None, resume_from=None, max_run_duration_seconds=None, framework_version=None, _enable_optimized_mode=False, _disable_validation=True, _show_lint_warnings=False, _show_package_warnings=False)
Parameters
Name | Description |
---|---|
source_directory
Required
|
A local directory containing experiment configuration files. |
compute_target
Required
|
The compute target where training will happen. This can either be an object or the string "local". |
vm_size
Required
|
The VM size of the compute target that will be created for the training. Supported values: Any Azure VM size. |
vm_priority
Required
|
The VM priority of the compute target that will be created for the training. If not specified, 'dedicated' is used. Supported values: 'dedicated' and 'lowpriority'. This takes effect only when the |
entry_script
Required
|
A string representing the relative path to the file used to start training. |
script_params
Required
|
A dictionary of command-line arguments to pass to your training script specified in
|
custom_docker_image
Required
|
The name of the Docker image from which the image to use for training will be built. If not set, a default CPU based image will be used as the base image. |
image_registry_details
Required
|
The details of the Docker image registry. |
user_managed
Required
|
Specifies whether Azure ML reuses an existing Python environment. False means that AzureML will create a Python environment based on the conda dependencies specification. |
conda_packages
Required
|
A list of strings representing conda packages to be added to the Python environment for the experiment. |
pip_packages
Required
|
A list of strings representing pip packages to be added to the Python environment for the experiment. |
conda_dependencies_file_path
Required
|
A string representing the relative path to the conda dependencies yaml file.
If specified, Azure ML will not install any framework related packages.
This can be provided in combination with the |
pip_requirements_file_path
Required
|
A string representing the relative path to the pip requirements text file.
This can be provided in combination with the |
conda_dependencies_file
Required
|
A string representing the relative path to the conda dependencies yaml file.
If specified, Azure ML will not install any framework related packages.
This can be provided in combination with the |
pip_requirements_file
Required
|
A string representing the relative path to the pip requirements text file.
This can be provided in combination with the |
environment_variables
Required
|
A dictionary of environment variables names and values. These environment variables are set on the process where user script is being executed. |
environment_definition
Required
|
The environment definition for an experiment includes
PythonSection, DockerSection, and environment variables. Any environment option not directly
exposed through other parameters to the Estimator construction can be set using |
inputs
Required
|
A list of DataReference or DatasetConsumptionConfig objects to use as input. |
shm_size
Required
|
The size of the Docker container's shared memory block. If not set, the default azureml.core.environment._DEFAULT_SHM_SIZE is used. |
resume_from
Required
|
The data path containing the checkpoint or model files from which to resume the experiment. |
max_run_duration_seconds
Required
|
The maximum allowed time for the run. Azure ML will attempt to automatically cancel the run if it takes longer than this value. |
framework_version
Required
|
The Scikit-learn version to be used for executing training code.
|
source_directory
Required
|
A local directory containing experiment configuration files. |
compute_target
Required
|
The compute target where training will happen. This can either be an object or the string "local". |
vm_size
Required
|
The VM size of the compute target that will be created for the training. Supported values: Any Azure VM size. |
vm_priority
Required
|
The VM priority of the compute target that will be created for the training. If not specified, 'dedicated' is used. Supported values: 'dedicated' and 'lowpriority'. This takes effect only when the |
entry_script
Required
|
A string representing the relative path to the file used to start training. |
script_params
Required
|
A dictionary of command-line arguments to pass to your training script specified in
|
use_docker
Required
|
A bool value indicating if the environment to run the experiment should be Docker-based. |
custom_docker_image
Required
|
The name of the Docker image from which the image to use for training will be built. If not set, a default CPU-based image will be used as the base image. |
image_registry_details
Required
|
The details of the Docker image registry. |
user_managed
Required
|
Specifies whether Azure ML reuses an existing Python environment. False means that AzureML will create a Python environment based on the conda dependencies specification. |
conda_packages
Required
|
A list of strings representing conda packages to be added to the Python environment for the experiment. |
pip_packages
Required
|
A list of strings representing pip packages to be added to the Python environment for the experiment. |
conda_dependencies_file_path
Required
|
A string representing the relative path to the conda dependencies
yaml file. If specified, Azure ML will not install any framework related packages.
This can be provided in combination with the |
pip_requirements_file_path
Required
|
A string representing the relative path to the pip requirements text file.
This can be provided in combination with the |
conda_dependencies_file
Required
|
A string representing the relative path to the conda dependencies
yaml file. If specified, Azure ML will not install any framework related packages.
This can be provided in combination with the |
pip_requirements_file
Required
|
A string representing the relative path to the pip requirements text file.
This can be provided in combination with the |
environment_variables
Required
|
A dictionary of environment variables names and values. These environment variables are set on the process where user script is being executed. |
environment_definition
Required
|
The environment definition for an experiment includes
PythonSection, DockerSection, and environment variables. Any environment option not directly exposed
through other parameters to the Estimator construction can be set using |
inputs
Required
|
A list of azureml.data.data_reference.DataReference objects to use as input. |
shm_size
Required
|
The size of the Docker container's shared memory block. If not set, the default azureml.core.environment._DEFAULT_SHM_SIZE is used. |
resume_from
Required
|
The data path containing the checkpoint or model files from which to resume the experiment. |
max_run_duration_seconds
Required
|
The maximum allowed time for the run. Azure ML will attempt to automatically cancel the run if it takes longer than this value. |
framework_version
Required
|
The Scikit-learn version to be used for executing training code.
|
_enable_optimized_mode
Required
|
Enable incremental environment build with pre-built framework images for faster environment preparation. A pre-built framework image is built on top of Azure ML default CPU/GPU base images with framework dependencies pre-installed. |
_disable_validation
Required
|
Disable script validation before run submission. The default is True. |
_show_lint_warnings
Required
|
Show script linting warnings. The default is False. |
_show_package_warnings
Required
|
Show package validation warnings. The default is False. |
Remarks
When submitting a training job, Azure ML runs your script in a conda environment within a Docker container. SKLearn containers have the following dependencies installed.
Dependencies | Scikit-learn 0.20.3 | ———————- | —————– | Python | 3.6.2 | azureml-defaults | Latest | IntelMpi | 2018.3.222 | scikit-learn | 0.20.3 | numpy | 1.16.2 | miniconda | 4.5.11 | scipy | 1.2.1 | joblib | 0.13.2 | git | 2.7.4 |
The Docker images extend Ubuntu 16.04.
If you need to install additional dependencies, you can either use the pip_packages
or
conda_packages
parameters, or you can provide your pip_requirements_file
or
conda_dependencies_file
file. Alternatively, you can build your own image and pass the
custom_docker_image
parameter to the estimator constructor.
Attributes
DEFAULT_VERSION
DEFAULT_VERSION = '0.20.3'
FRAMEWORK_NAME
FRAMEWORK_NAME = 'SKLearn'