SKLearn Class
Creates an estimator for training in Scikit-learn experiments.
DEPRECATED. Use the ScriptRunConfig object with your own defined environment or the AzureML-Tutorial curated environment. For an introduction to configuring SKLearn experiment runs with ScriptRunConfig, see Train scikit-learn models at scale with Azure Machine Learning.
This estimator only supports single-node CPU training.
Supported versions: 0.20.3
Initialize a Scikit-learn estimator.
- Inheritance
-
azureml.train.estimator._framework_base_estimator._FrameworkBaseEstimatorSKLearn
Constructor
SKLearn(source_directory, *, compute_target=None, vm_size=None, vm_priority=None, entry_script=None, script_params=None, use_docker=True, custom_docker_image=None, image_registry_details=None, user_managed=False, conda_packages=None, pip_packages=None, conda_dependencies_file_path=None, pip_requirements_file_path=None, conda_dependencies_file=None, pip_requirements_file=None, environment_variables=None, environment_definition=None, inputs=None, shm_size=None, resume_from=None, max_run_duration_seconds=None, framework_version=None, _enable_optimized_mode=False, _disable_validation=True, _show_lint_warnings=False, _show_package_warnings=False)
Parameters
- compute_target
- AbstractComputeTarget or str
The compute target where training will happen. This can either be an object or the string "local".
- vm_size
- str
The VM size of the compute target that will be created for the training.
Supported values: Any Azure VM size.
- vm_priority
- str
The VM priority of the compute target that will be created for the training. If not specified, 'dedicated' is used.
Supported values: 'dedicated' and 'lowpriority'.
This takes effect only when the vm_size param
is specified in the input.
- entry_script
- str
A string representing the relative path to the file used to start training.
- script_params
- dict
A dictionary of command-line arguments to pass to your training script specified in
entry_script
.
- custom_docker_image
- str
The name of the Docker image from which the image to use for training will be built. If not set, a default CPU based image will be used as the base image.
- user_managed
- bool
Specifies whether Azure ML reuses an existing Python environment. False means that AzureML will create a Python environment based on the conda dependencies specification.
- conda_packages
- list
A list of strings representing conda packages to be added to the Python environment for the experiment.
- pip_packages
- list
A list of strings representing pip packages to be added to the Python environment for the experiment.
- conda_dependencies_file_path
- str
A string representing the relative path to the conda dependencies yaml file.
If specified, Azure ML will not install any framework related packages.
This can be provided in combination with the conda_packages
parameter.
DEPRECATED. Use the conda_dependencies_file
parameter.
- pip_requirements_file_path
- str
A string representing the relative path to the pip requirements text file.
This can be provided in combination with the pip_packages
parameter.
DEPRECATED. Use the pip_requirements_file
parameter.
- conda_dependencies_file
- str
A string representing the relative path to the conda dependencies yaml file.
If specified, Azure ML will not install any framework related packages.
This can be provided in combination with the conda_packages
parameter.
- pip_requirements_file
- str
A string representing the relative path to the pip requirements text file.
This can be provided in combination with the pip_packages
parameter.
- environment_variables
- dict
A dictionary of environment variables names and values. These environment variables are set on the process where user script is being executed.
- environment_definition
- Environment
The environment definition for an experiment includes
PythonSection, DockerSection, and environment variables. Any environment option not directly
exposed through other parameters to the Estimator construction can be set using environment_definition
parameter. If this parameter is specified, it will take precedence over other environment related
parameters like use_gpu
, custom_docker_image
, conda_packages
, or pip_packages
.
Errors will be reported invalid combinations.
- shm_size
- str
The size of the Docker container's shared memory block. If not set, the default azureml.core.environment._DEFAULT_SHM_SIZE is used.
- resume_from
- DataPath
The data path containing the checkpoint or model files from which to resume the experiment.
- max_run_duration_seconds
- int
The maximum allowed time for the run. Azure ML will attempt to automatically cancel the run if it takes longer than this value.
- framework_version
- str
The Scikit-learn version to be used for executing training code.
SKLearn.get_supported_versions()
returns a list of the versions supported by the current SDK.
- compute_target
- AbstractComputeTarget or str
The compute target where training will happen. This can either be an object or the string "local".
- vm_size
- str
The VM size of the compute target that will be created for the training. Supported values: Any Azure VM size.
- vm_priority
- str
The VM priority of the compute target that will be created for the training. If not specified, 'dedicated' is used.
Supported values: 'dedicated' and 'lowpriority'.
This takes effect only when the vm_size param
is specified in the input.
- entry_script
- str
A string representing the relative path to the file used to start training.
- script_params
- dict
A dictionary of command-line arguments to pass to your training script specified in
entry_script
.
- use_docker
- bool
A bool value indicating if the environment to run the experiment should be Docker-based.
- custom_docker_image
- str
The name of the Docker image from which the image to use for training will be built. If not set, a default CPU-based image will be used as the base image.
- user_managed
- bool
Specifies whether Azure ML reuses an existing Python environment. False means that AzureML will create a Python environment based on the conda dependencies specification.
- conda_packages
- list
A list of strings representing conda packages to be added to the Python environment for the experiment.
- pip_packages
- list
A list of strings representing pip packages to be added to the Python environment for the experiment.
- conda_dependencies_file_path
- str
A string representing the relative path to the conda dependencies
yaml file. If specified, Azure ML will not install any framework related packages.
This can be provided in combination with the conda_packages
parameter.
DEPRECATED. Use the conda_dependencies_file
parameter.
- pip_requirements_file_path
- str
A string representing the relative path to the pip requirements text file.
This can be provided in combination with the pip_packages
parameter.
DEPRECATED. Use the pip_requirements_file
parameter.
- conda_dependencies_file
- str
A string representing the relative path to the conda dependencies
yaml file. If specified, Azure ML will not install any framework related packages.
This can be provided in combination with the conda_packages
parameter.
- pip_requirements_file
- str
A string representing the relative path to the pip requirements text file.
This can be provided in combination with the pip_packages
parameter.
- environment_variables
- dict
A dictionary of environment variables names and values. These environment variables are set on the process where user script is being executed.
- environment_definition
- Environment
The environment definition for an experiment includes
PythonSection, DockerSection, and environment variables. Any environment option not directly exposed
through other parameters to the Estimator construction can be set using environment_definition
parameter. If this parameter is specified, it will take precedence over other environment related
parameters like use_gpu
, custom_docker_image
, conda_packages
, or pip_packages
.
Errors will be reported invalid combinations.
- shm_size
- str
The size of the Docker container's shared memory block. If not set, the default azureml.core.environment._DEFAULT_SHM_SIZE is used.
- resume_from
- DataPath
The data path containing the checkpoint or model files from which to resume the experiment.
- max_run_duration_seconds
- int
The maximum allowed time for the run. Azure ML will attempt to automatically cancel the run if it takes longer than this value.
- framework_version
- str
The Scikit-learn version to be used for executing training code.
SKLearn.get_supported_versions()
returns a list of the versions supported by the current SDK.
- _enable_optimized_mode
- bool
Enable incremental environment build with pre-built framework images for faster environment preparation. A pre-built framework image is built on top of Azure ML default CPU/GPU base images with framework dependencies pre-installed.
- _disable_validation
- bool
Disable script validation before run submission. The default is True.
Remarks
When submitting a training job, Azure ML runs your script in a conda environment within a Docker container. SKLearn containers have the following dependencies installed.
Dependencies | Scikit-learn 0.20.3 | ———————- | —————– | Python | 3.6.2 | azureml-defaults | Latest | IntelMpi | 2018.3.222 | scikit-learn | 0.20.3 | numpy | 1.16.2 | miniconda | 4.5.11 | scipy | 1.2.1 | joblib | 0.13.2 | git | 2.7.4 |
The Docker images extend Ubuntu 16.04.
If you need to install additional dependencies, you can either use the pip_packages
or
conda_packages
parameters, or you can provide your pip_requirements_file
or
conda_dependencies_file
file. Alternatively, you can build your own image and pass the
custom_docker_image
parameter to the estimator constructor.
Attributes
DEFAULT_VERSION
DEFAULT_VERSION = '0.20.3'
FRAMEWORK_NAME
FRAMEWORK_NAME = 'SKLearn'
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for