TensorFlow Class
Represents an estimator for training in TensorFlow experiments.
DEPRECATED. Use the ScriptRunConfig object with your own defined environment or one of the Azure ML TensorFlow curated environments. For an introduction to configuring TensorFlow experiment runs with ScriptRunConfig, see Train TensorFlow models at scale with Azure Machine Learning.
Supported versions: 1.10, 1.12, 1.13, 2.0, 2.1, 2.2
Initialize a TensorFlow estimator.
Docker run reference. :type shm_size: str :param resume_from: The data path containing the checkpoint or model files from which to resume the experiment. :type resume_from: azureml.data.datapath.DataPath :param max_run_duration_seconds: The maximum allowed time for the run. Azure ML will attempt to automatically
cancel the run if it takes longer than this value.
- Inheritance
-
azureml.train.estimator._framework_base_estimator._FrameworkBaseEstimatorTensorFlow
Constructor
TensorFlow(source_directory, *, compute_target=None, vm_size=None, vm_priority=None, entry_script=None, script_params=None, node_count=1, process_count_per_node=1, worker_count=1, parameter_server_count=1, distributed_backend=None, distributed_training=None, use_gpu=False, use_docker=True, custom_docker_base_image=None, custom_docker_image=None, image_registry_details=None, user_managed=False, conda_packages=None, pip_packages=None, conda_dependencies_file_path=None, pip_requirements_file_path=None, conda_dependencies_file=None, pip_requirements_file=None, environment_variables=None, environment_definition=None, inputs=None, source_directory_data_store=None, shm_size=None, resume_from=None, max_run_duration_seconds=None, framework_version=None, _enable_optimized_mode=False, _disable_validation=True, _show_lint_warnings=False, _show_package_warnings=False)
Parameters
Name | Description |
---|---|
source_directory
Required
|
A local directory containing experiment configuration files. |
compute_target
Required
|
The compute target where training will happen. This can either be an object or the string "local". |
vm_size
Required
|
The VM size of the compute target that will be created for the training. Supported values: Any Azure VM size. |
vm_priority
Required
|
The VM priority of the compute target that will be created for the training. If not specified, 'dedicated' is used. Supported values:'dedicated' and 'lowpriority'. This takes effect only when the |
entry_script
Required
|
The relative path to the file containing the training script. |
script_params
Required
|
A dictionary of command-line arguments to pass to the training script specified in
|
node_count
Required
|
The number of nodes in the compute target used for training. Only the
AmlCompute target is supported for distributed training ( |
process_count_per_node
Required
|
When using MPI, the number of processes per node. |
worker_count
Required
|
When using Parameter Server for distributed training, the number of worker nodes. DEPRECATED. Specify as part of the |
parameter_server_count
Required
|
When using Parameter Server for distributed training, the number of parameter server nodes. |
distributed_backend
Required
|
The communication backend for distributed training. DEPRECATED. Use the Supported values: 'mpi' and 'ps'. 'mpi' represents MPI/Horovod and 'ps' represents Parameter Server. This parameter is required when any of When |
distributed_training
Required
|
Parameters for running a distributed training job. For running a distributed job with Parameter Server backend, use the
ParameterServer object to specify For running a distributed job with MPI backend, use the
Mpi object to specify |
use_gpu
Required
|
Specifies whether the environment to run the experiment should support GPUs.
If true, a GPU-based default docker image will be used in the environment. If false, a CPU-based
image will be used. Default docker images (CPU or GPU) will be used only if the |
use_docker
Required
|
Specifies whether the environment in which to run the experiment should be Docker-based. |
custom_docker_base_image
Required
|
The name of the Docker image from which the image to use for training will be built. DEPRECATED. Use the If not set, a default CPU-based image will be used as the base image. |
custom_docker_image
Required
|
The name of the Docker image from which the image to use for training will be built. If not set, a default CPU-based image will be used as the base image. |
image_registry_details
Required
|
The details of the Docker image registry. |
user_managed
Required
|
Specifies whether Azure ML reuses an existing python environment. If false, Azure ML will create a Python environment based on the conda dependencies specification. |
conda_packages
Required
|
A list of strings representing conda packages to be added to the Python environment for the experiment. |
pip_packages
Required
|
A list of strings representing pip packages to be added to the Python environment for the experiment. |
conda_dependencies_file_path
Required
|
A string representing the relative path to the conda dependencies yaml file.
If specified, Azure ML will not install any framework related packages.
DEPRECATED. Use the |
pip_requirements_file_path
Required
|
A string representing the relative path to the pip requirements text file.
This can be provided in combination with the |
conda_dependencies_file
Required
|
A string representing the relative path to the conda dependencies yaml file. If specified, Azure ML will not install any framework related packages. |
pip_requirements_file
Required
|
A string representing the relative path to the pip requirements text file.
This can be provided in combination with the |
environment_variables
Required
|
A dictionary of environment variables names and values. These environment variables are set on the process where user script is being executed. |
environment_definition
Required
|
The environment definition for the experiment. It includes
PythonSection, DockerSection, and environment variables. Any environment option not directly
exposed through other parameters to the Estimator construction can be set using this
parameter. If this parameter is specified, it will take precedence over other environment related
parameters like |
inputs
Required
|
A list of DataReference or DatasetConsumptionConfig objects to use as input. |
source_directory_data_store
Required
|
The backing datastore for project share. |
shm_size
Required
|
The size of the Docker container's shared memory block. If not set, the default azureml.core.environment._DEFAULT_SHM_SIZE is used. For more information, see Docker run reference. |
resume_from
Required
|
The data path containing the checkpoint or model files from which to resume the experiment. |
max_run_duration_seconds
Required
|
The maximum allowed time for the run. Azure ML will attempt to automatically cancel the run if it takes longer than this value. |
framework_version
Required
|
The TensorFlow version to be used for executing training code.
If no version is provided, the estimator will default to the latest version supported by Azure ML.
Use |
source_directory
Required
|
A local directory containing experiment configuration files. |
compute_target
Required
|
The compute target where training will happen. This can either be an object or the string "local". |
vm_size
Required
|
The VM size of the compute target that will be created for the training. Supported values: Any Azure VM size. |
vm_priority
Required
|
The VM priority of the compute target that will be created for the training. If not specified, 'dedicated' is used. Supported values:'dedicated' and 'lowpriority'. This takes effect only when the |
entry_script
Required
|
The relative path to the file containing the training script. |
script_params
Required
|
A dictionary of command-line arguments to pass to tne training script specified in
|
node_count
Required
|
The number of nodes in the compute target used for training. Only the
AmlCompute target is supported for distributed training ( |
process_count_per_node
Required
|
When using MPI, the number of processes per node. |
worker_count
Required
|
When using Parameter Server, the number of worker nodes. DEPRECATED. Specify as part of the |
parameter_server_count
Required
|
When using Parameter Server, the number of parameter server nodes. |
distributed_backend
Required
|
The communication backend for distributed training. DEPRECATED. Use the Supported values: 'mpi' and 'ps'. 'mpi' represents MPI/Horovod and 'ps' represents Parameter Server. This parameter is required when any of When |
distributed_training
Required
|
Parameters for running a distributed training job. For running a distributed job with the Parameter Server backend, use
ParameterServer object to
specify For running a distributed job with MPI backend, use Mpi
object to specify |
use_gpu
Required
|
Specifies whether the environment to run the experiment should support GPUs.
If true, a GPU-based default Docker image will be used in the environment. If false, a CPU-based
image will be used. Default docker images (CPU or GPU) will be used only if |
use_docker
Required
|
Specifies whether the environment in which to run the experiment should be Docker-based. |
custom_docker_base_image
Required
|
The name of the Docker image from which the image to use for training will be built. DEPRECATED. Use the If not set, a default CPU-based image will be used as the base image. |
custom_docker_image
Required
|
The name of the Docker image from which the image to use for training will be built. If not set, a default CPU-based image will be used as the base image. |
image_registry_details
Required
|
The details of the Docker image registry. |
user_managed
Required
|
Specifies whether Azure ML reuses an existing Python environment. If false, Azure ML will create a Python environment based on the conda dependencies specification. |
conda_packages
Required
|
A list of strings representing conda packages to be added to the Python environment for the experiment. |
pip_packages
Required
|
A list of strings representing pip packages to be added to the Python environment for the experiment. |
conda_dependencies_file_path
Required
|
The relative path to the conda dependencies
yaml file. If specified, Azure ML will not install any framework related packages.
DEPRECATED. Use the |
pip_requirements_file_path
Required
|
The relative path to the pip requirements text file.
This can be provided in combination with the |
environment_variables
Required
|
A dictionary of environment variables names and values. These environment variables are set on the process where user script is being executed. |
conda_dependencies_file
Required
|
A string representing the relative path to the conda dependencies yaml file. If specified, Azure ML will not install any framework related packages. |
pip_requirements_file
Required
|
The relative path to the pip requirements text file.
This can be provided in combination with the |
environment_variables
Required
|
A dictionary of environment variables names and values. These environment variables are set on the process where user script is being executed. |
environment_definition
Required
|
The environment definition for the experiment. It includes
PythonSection, DockerSection, and environment variables. Any environment option not directly
exposed through other parameters to the Estimator construction can be set using this
parameter. If this parameter is specified, it will take precedence over other environment-related
parameters like |
inputs
Required
|
A list of azureml.data.data_reference.DataReference objects to use as input. |
source_directory_data_store
Required
|
The backing datastore for project share. |
shm_size
Required
|
The size of the Docker container's shared memory block. If not set, default is azureml.core.environment._DEFAULT_SHM_SIZE. For more information, see |
framework_version
Required
|
The TensorFlow version to be used for executing training code. If no version is provided, the estimator will default to the latest version supported by Azure ML. Use TensorFlow.get_supported_versions() to return a list to get a list of all versions supported the current Azure ML SDK. |
_enable_optimized_mode
Required
|
Enable incremental environment build with pre-built framework images for faster environment preparation. A pre-built framework image is built on top of Azure ML default CPU/GPU base images with framework dependencies pre-installed. |
_disable_validation
Required
|
Disable script validation before run submission. The default is True. |
_show_lint_warnings
Required
|
Show script linting warnings. The default is False. |
_show_package_warnings
Required
|
Show package validation warnings. The default is False. |
Remarks
When submitting a training job, Azure ML runs your script in a conda environment within a Docker container. The TensorFlow containers have the following dependencies installed.
Dependencies | TensorFlow 1.10/1.12 | TensorFlow 1.13 | TF 2.0/2.1/2.2 | ———————————— | ——————– | ————— | —————— | Python | 3.6.2 | 3.6.2 | 3.6.2 | CUDA (GPU image only) | 9.0 | 10.0 | 10.0 | cuDNN (GPU image only) | 7.6.3 | 7.6.3 | 7.6.3 | NCCL (GPU image only) | 2.4.8 | 2.4.8 | 2.4.8 | azureml-defaults | Latest | Latest | Latest | azureml-dataset-runtime[fuse,pandas] | Latest | Latest | Latest | IntelMpi | 2018.3.222 | 2018.3.222 | —- | OpenMpi | —- | —- | 3.1.2 | horovod | 0.15.2 | 0.16.1 | 0.18.1/0.19.1/0.19.5 | miniconda | 4.5.11 | 4.5.11 | 4.5.11 | tensorflow | 1.10.0/1.12.0 | 1.13.1 | 2.0.0/2.1.0/2.2.0 | git | 2.7.4 | 2.7.4 | 2.7.4 |
The v1 Docker images extend Ubuntu 16.04. The v2 Docker images extend Ubuntu 18.04.
To install additional dependencies, you can either use the pip_packages
or conda_packages
parameter. Or, you can specify the pip_requirements_file
or conda_dependencies_file
parameter.
Alternatively, you can build your own image, and pass the custom_docker_image
parameter to the
estimator constructor.
For more information about Docker containers used in TensorFlow training, see https://github.com/Azure/AzureML-Containers.
The TensorFlow class supports two methods of distributed training:
MPI-based distributed training using the
Horovod framework
Native distributed TensorFlow
For examples and more information about using TensorFlow in distributed training, see the tutorial Train and register TensorFlow models at scale with Azure Machine Learning.
Attributes
DEFAULT_VERSION
DEFAULT_VERSION = '1.13'
FRAMEWORK_NAME
FRAMEWORK_NAME = 'TensorFlow'