TensorFlow Class

Reference

Represents an estimator for training in TensorFlow experiments.

DEPRECATED. Use the ScriptRunConfig object with your own defined environment or one of the Azure ML TensorFlow curated environments. For an introduction to configuring TensorFlow experiment runs with ScriptRunConfig, see Train TensorFlow models at scale with Azure Machine Learning.

Supported versions: 1.10, 1.12, 1.13, 2.0, 2.1, 2.2

Initialize a TensorFlow estimator.

Docker run reference. :type shm_size: str :param resume_from: The data path containing the checkpoint or model files from which to resume the experiment. :type resume_from: azureml.data.datapath.DataPath :param max_run_duration_seconds: The maximum allowed time for the run. Azure ML will attempt to automatically

cancel the run if it takes longer than this value.

Inheritance: azureml.train.estimator._framework_base_estimator._FrameworkBaseEstimator

TensorFlow

Constructor

TensorFlow(source_directory, *, compute_target=None, vm_size=None, vm_priority=None, entry_script=None, script_params=None, node_count=1, process_count_per_node=1, worker_count=1, parameter_server_count=1, distributed_backend=None, distributed_training=None, use_gpu=False, use_docker=True, custom_docker_base_image=None, custom_docker_image=None, image_registry_details=None, user_managed=False, conda_packages=None, pip_packages=None, conda_dependencies_file_path=None, pip_requirements_file_path=None, conda_dependencies_file=None, pip_requirements_file=None, environment_variables=None, environment_definition=None, inputs=None, source_directory_data_store=None, shm_size=None, resume_from=None, max_run_duration_seconds=None, framework_version=None, _enable_optimized_mode=False, _disable_validation=True, _show_lint_warnings=False, _show_package_warnings=False)

Parameters

source_directory: str

Required

A local directory containing experiment configuration files.

compute_target: AbstractComputeTarget or str

Required

The compute target where training will happen. This can either be an object or the string "local".

vm_size: str

Required

The VM size of the compute target that will be created for the training. Supported values: Any Azure VM size.

vm_priority: str

Required

The VM priority of the compute target that will be created for the training. If not specified, 'dedicated' is used.

Supported values:'dedicated' and 'lowpriority'.

This takes effect only when the vm_size param is specified in the input.

entry_script: str

Required

The relative path to the file containing the training script.

script_params: dict

Required

A dictionary of command-line arguments to pass to the training script specified in entry_script.

node_count: int

Required

The number of nodes in the compute target used for training. Only the AmlCompute target is supported for distributed training (node_count > 1).

process_count_per_node: int

Required

When using MPI, the number of processes per node.

worker_count: int

Required

When using Parameter Server for distributed training, the number of worker nodes.

DEPRECATED. Specify as part of the distributed_training parameter.

parameter_server_count: int

Required

When using Parameter Server for distributed training, the number of parameter server nodes.

distributed_backend: str

Required

The communication backend for distributed training.

DEPRECATED. Use the distributed_training parameter.

Supported values: 'mpi' and 'ps'. 'mpi' represents MPI/Horovod and 'ps' represents Parameter Server.

This parameter is required when any of node_count, process_count_per_node, worker_count, or parameter_server_count > 1. In case of 'ps', the sum of worker_count and parameter_server_count should be less than or equal to node_count * (number of CPUs or GPUs per node)

When node_count == 1 and process_count_per_node == 1, no backend will be used unless the backend is explicitly set. Only the AmlCompute target is supported for distributed training.

distributed_training: ParameterServer or Mpi

Required

Parameters for running a distributed training job.

For running a distributed job with Parameter Server backend, use the ParameterServer object to specify worker_count and parameter_server_count. The sum of the worker_count and the parameter_server_count parameters should be less than or equal to node_count * (the number of CPUs or GPUs per node).

For running a distributed job with MPI backend, use the Mpi object to specify process_count_per_node.

use_gpu: bool

Required

Specifies whether the environment to run the experiment should support GPUs. If true, a GPU-based default docker image will be used in the environment. If false, a CPU-based image will be used. Default docker images (CPU or GPU) will be used only if the custom_docker_image parameter is not set. This setting is used only in Docker-enabled compute targets.

use_docker: bool

Required

Specifies whether the environment in which to run the experiment should be Docker-based.

custom_docker_base_image: str

Required

The name of the Docker image from which the image to use for training will be built.

DEPRECATED. Use the custom_docker_image parameter.

If not set, a default CPU-based image will be used as the base image.

custom_docker_image: str

Required

The name of the Docker image from which the image to use for training will be built. If not set, a default CPU-based image will be used as the base image.

image_registry_details: ContainerRegistry

Required

The details of the Docker image registry.

user_managed: bool

Required

Specifies whether Azure ML reuses an existing python environment. If false, Azure ML will create a Python environment based on the conda dependencies specification.

conda_packages: list

Required

A list of strings representing conda packages to be added to the Python environment for the experiment.

pip_packages: list

Required

A list of strings representing pip packages to be added to the Python environment for the experiment.

conda_dependencies_file_path: str

Required

A string representing the relative path to the conda dependencies yaml file. If specified, Azure ML will not install any framework related packages. DEPRECATED. Use the conda_dependencies_file parameter.

pip_requirements_file_path: str

Required

A string representing the relative path to the pip requirements text file. This can be provided in combination with the pip_packages parameter. DEPRECATED. Use the pip_requirements_file parameter.

conda_dependencies_file: str

Required

A string representing the relative path to the conda dependencies yaml file. If specified, Azure ML will not install any framework related packages.

pip_requirements_file: str

Required

A string representing the relative path to the pip requirements text file. This can be provided in combination with the pip_packages parameter.

environment_variables: dict

Required

A dictionary of environment variables names and values. These environment variables are set on the process where user script is being executed.

environment_definition: Environment

Required

The environment definition for the experiment. It includes PythonSection, DockerSection, and environment variables. Any environment option not directly exposed through other parameters to the Estimator construction can be set using this parameter. If this parameter is specified, it will take precedence over other environment related parameters like use_gpu, custom_docker_image, conda_packages, or pip_packages. Errors will be reported on these invalid combinations.

inputs: list

Required

A list of DataReference or DatasetConsumptionConfig objects to use as input.

source_directory_data_store: Datastore

Required

The backing datastore for project share.

shm_size: str

Required

The size of the Docker container's shared memory block. If not set, the default azureml.core.environment._DEFAULT_SHM_SIZE is used. For more information, see Docker run reference.

resume_from: DataPath

Required

The data path containing the checkpoint or model files from which to resume the experiment.

max_run_duration_seconds: int

Required

The maximum allowed time for the run. Azure ML will attempt to automatically cancel the run if it takes longer than this value.

framework_version: str

Required

The TensorFlow version to be used for executing training code. If no version is provided, the estimator will default to the latest version supported by Azure ML. Use TensorFlow.get_supported_versions() to return a list to get a list of all versions supported the current Azure ML SDK.

source_directory: str

Required

A local directory containing experiment configuration files.

compute_target: AbstractComputeTarget or str

Required

The compute target where training will happen. This can either be an object or the string "local".

vm_size: str

Required

The VM size of the compute target that will be created for the training. Supported values: Any Azure VM size.

vm_priority: str

Required

The VM priority of the compute target that will be created for the training. If not specified, 'dedicated' is used.

Supported values:'dedicated' and 'lowpriority'.

This takes effect only when the vm_size param is specified in the input.

entry_script: str

Required

The relative path to the file containing the training script.

script_params: dict

Required

A dictionary of command-line arguments to pass to tne training script specified in entry_script.

node_count: int

Required

The number of nodes in the compute target used for training. Only the AmlCompute target is supported for distributed training (node_count > 1).

process_count_per_node: int

Required

When using MPI, the number of processes per node.

worker_count: int

Required

When using Parameter Server, the number of worker nodes.

DEPRECATED. Specify as part of the distributed_training parameter.

parameter_server_count: int

Required

When using Parameter Server, the number of parameter server nodes.

distributed_backend: str

Required

The communication backend for distributed training.

DEPRECATED. Use the distributed_training parameter.

Supported values: 'mpi' and 'ps'. 'mpi' represents MPI/Horovod and 'ps' represents Parameter Server.

When node_count == 1 and process_count_per_node == 1, no backend will be used unless the backend is explicitly set. Only the AmlCompute target is supported for distributed training. is supported for distributed training.

distributed_training: ParameterServer or Mpi

Required

Parameters for running a distributed training job.

For running a distributed job with the Parameter Server backend, use ParameterServer object to specify worker_count and parameter_server_count. The sum of the worker_count and the parameter_server_count parameters should be less than or equal to node_count * (the number of CPUs or GPUs per node).

For running a distributed job with MPI backend, use Mpi object to specify process_count_per_node.

use_gpu: bool

Required

Specifies whether the environment to run the experiment should support GPUs. If true, a GPU-based default Docker image will be used in the environment. If false, a CPU-based image will be used. Default docker images (CPU or GPU) will be used only if custom_docker_image parameter is not set. This setting is used only in Docker-enabled compute targets.

use_docker: bool

Required

Specifies whether the environment in which to run the experiment should be Docker-based.

custom_docker_base_image: str

Required

The name of the Docker image from which the image to use for training will be built.

DEPRECATED. Use the custom_docker_image parameter.

If not set, a default CPU-based image will be used as the base image.

custom_docker_image: str

Required

The name of the Docker image from which the image to use for training will be built. If not set, a default CPU-based image will be used as the base image.

image_registry_details: ContainerRegistry

Required

The details of the Docker image registry.

user_managed: bool

Required

Specifies whether Azure ML reuses an existing Python environment. If false, Azure ML will create a Python environment based on the conda dependencies specification.

conda_packages: list

Required

A list of strings representing conda packages to be added to the Python environment for the experiment.

pip_packages: list

Required

A list of strings representing pip packages to be added to the Python environment for the experiment.

conda_dependencies_file_path: str

Required

The relative path to the conda dependencies yaml file. If specified, Azure ML will not install any framework related packages. DEPRECATED. Use the conda_dependencies_file parameter.

pip_requirements_file_path: str

Required

The relative path to the pip requirements text file. This can be provided in combination with the pip_packages parameter. DEPRECATED. Use the pip_requirements_file parameter.

environment_variables: dict

Required

A dictionary of environment variables names and values. These environment variables are set on the process where user script is being executed.

conda_dependencies_file: str

Required

A string representing the relative path to the conda dependencies yaml file. If specified, Azure ML will not install any framework related packages.

pip_requirements_file: str

Required

The relative path to the pip requirements text file. This can be provided in combination with the pip_packages parameter.

environment_variables

Required

A dictionary of environment variables names and values. These environment variables are set on the process where user script is being executed.

environment_definition: Environment

Required

The environment definition for the experiment. It includes PythonSection, DockerSection, and environment variables. Any environment option not directly exposed through other parameters to the Estimator construction can be set using this parameter. If this parameter is specified, it will take precedence over other environment-related parameters like use_gpu, custom_docker_image, conda_packages, or pip_packages. Errors will be reported on these invalid combinations.

inputs: list

Required

A list of azureml.data.data_reference.DataReference objects to use as input.

source_directory_data_store: str

Required

The backing datastore for project share.

shm_size

Required

The size of the Docker container's shared memory block. If not set, default is azureml.core.environment._DEFAULT_SHM_SIZE. For more information, see

framework_version: str

Required

The TensorFlow version to be used for executing training code. If no version is provided, the estimator will default to the latest version supported by Azure ML. Use TensorFlow.get_supported_versions() to return a list to get a list of all versions supported the current Azure ML SDK.

_enable_optimized_mode: bool

Required

Enable incremental environment build with pre-built framework images for faster environment preparation. A pre-built framework image is built on top of Azure ML default CPU/GPU base images with framework dependencies pre-installed.

_disable_validation: bool

Required

Disable script validation before run submission. The default is True.

_show_lint_warnings: bool

Required

Show script linting warnings. The default is False.

_show_package_warnings: bool

Required

Show package validation warnings. The default is False.

Remarks

When submitting a training job, Azure ML runs your script in a conda environment within a Docker container. The TensorFlow containers have the following dependencies installed.

Dependencies | TensorFlow 1.10/1.12 | TensorFlow 1.13 | TF 2.0/2.1/2.2 | ———————————— | ——————– | ————— | —————— | Python | 3.6.2 | 3.6.2 | 3.6.2 | CUDA (GPU image only) | 9.0 | 10.0 | 10.0 | cuDNN (GPU image only) | 7.6.3 | 7.6.3 | 7.6.3 | NCCL (GPU image only) | 2.4.8 | 2.4.8 | 2.4.8 | azureml-defaults | Latest | Latest | Latest | azureml-dataset-runtime[fuse,pandas] | Latest | Latest | Latest | IntelMpi | 2018.3.222 | 2018.3.222 | —- | OpenMpi | —- | —- | 3.1.2 | horovod | 0.15.2 | 0.16.1 | 0.18.1/0.19.1/0.19.5 | miniconda | 4.5.11 | 4.5.11 | 4.5.11 | tensorflow | 1.10.0/1.12.0 | 1.13.1 | 2.0.0/2.1.0/2.2.0 | git | 2.7.4 | 2.7.4 | 2.7.4 |

The v1 Docker images extend Ubuntu 16.04. The v2 Docker images extend Ubuntu 18.04.

To install additional dependencies, you can either use the pip_packages or conda_packages parameter. Or, you can specify the pip_requirements_file or conda_dependencies_file parameter. Alternatively, you can build your own image, and pass the custom_docker_image parameter to the estimator constructor.

For more information about Docker containers used in TensorFlow training, see https://github.com/Azure/AzureML-Containers.

The TensorFlow class supports two methods of distributed training:

MPI-based distributed training using the

Horovod framework
Native distributed TensorFlow

For examples and more information about using TensorFlow in distributed training, see the tutorial Train and register TensorFlow models at scale with Azure Machine Learning.

Attributes

DEFAULT_VERSION

DEFAULT_VERSION = '1.13'

FRAMEWORK_NAME

FRAMEWORK_NAME = 'TensorFlow'

TensorFlow Class

Constructor

Parameters

Remarks

Attributes

DEFAULT_VERSION

FRAMEWORK_NAME

Feedback

Feedback

Additional resources