Migrating from Estimators to ScriptRunConfig

APPLIES TO: Python SDK azureml v1

Up until now, there have been multiple methods for configuring a training job in Azure Machine Learning via the SDK, including Estimators, ScriptRunConfig, and the lower-level RunConfiguration. To address this ambiguity and inconsistency, we are simplifying the job configuration process in Azure Machine Learning. You should now use ScriptRunConfig as the recommended option for configuring training jobs.

Estimators are deprecated with the 1.19. release of the Python SDK. You should also generally avoid explicitly instantiating a RunConfiguration object yourself, and instead configure your job using the ScriptRunConfig class.

This article covers common considerations when migrating from Estimators to ScriptRunConfig.


To migrate to ScriptRunConfig from Estimators, make sure you are using >= 1.15.0 of the Python SDK.

ScriptRunConfig documentation and samples

Azure Machine Learning documentation and samples have been updated to use ScriptRunConfig for job configuration and submission.

For information on using ScriptRunConfig, refer to the following documentation:

In addition, refer to the following samples & tutorials:

Defining the training environment

While the various framework estimators have preconfigured environments that are backed by Docker images, the Dockerfiles for these images are private. Therefore you do not have a lot of transparency into what these environments contain. In addition, the estimators take in environment-related configurations as individual parameters (such as pip_packages, custom_docker_image) on their respective constructors.

When using ScriptRunConfig, all environment-related configurations are encapsulated in the Environment object that gets passed into the environment parameter of the ScriptRunConfig constructor. To configure a training job, provide an environment that has all the dependencies required for your training script. If no environment is provided, Azure Machine Learning will use one of the Azure Machine Learning base images, specifically the one defined by azureml.core.environment.DEFAULT_CPU_IMAGE, as the default environment. There are a couple of ways to provide an environment:

  • Use a curated environment - curated environments are predefined environments available in your workspace by default. There is a corresponding curated environment for each of the preconfigured framework/version Docker images that backed each framework estimator.
  • Define your own custom environment

Here is an example of using the curated environment for training:

from azureml.core import Workspace, ScriptRunConfig, Environment

curated_env_name = '<add Pytorch curated environment name here>'
pytorch_env = Environment.get(workspace=ws, name=curated_env_name)

compute_target = ws.compute_targets['my-cluster']
src = ScriptRunConfig(source_directory='.',


For a list of curated environments, see curated environments.

If you want to specify environment variables that will get set on the process where the training script is executed, use the Environment object:

myenv.environment_variables = {"MESSAGE":"Hello from Azure Machine Learning"}

For information on configuring and managing Azure Machine Learning environments, see:

Using data for training


If you are using an Azure Machine Learning dataset for training, pass the dataset as an argument to your script using the arguments parameter. By doing so, you will get the data path (mounting point or download path) in your training script via arguments.

The following example configures a training job where the FileDataset, mnist_ds, will get mounted on the remote compute.

src = ScriptRunConfig(source_directory='.',
                      arguments=['--data-folder', mnist_ds.as_mount()], # or mnist_ds.as_download() to download

DataReference (old)

While we recommend using Azure Machine Learning Datasets over the old DataReference way, if you are still using DataReferences for any reason, you must configure your job as follows:

# if you want to pass a DataReference object, such as the below:
datastore = ws.get_default_datastore()
data_ref = datastore.path('./foo').as_mount()

src = ScriptRunConfig(source_directory='.',
                      arguments=['--data-folder', str(data_ref)], # cast the DataReference object to str
src.run_config.data_references = {data_ref.data_reference_name: data_ref.to_config()} # set a dict of the DataReference(s) you want to the `data_references` attribute of the ScriptRunConfig's underlying RunConfiguration object.

For more information on using data for training, see:

Distributed training

If you need to configure a distributed job for training, do so by specifying the distributed_job_config parameter in the ScriptRunConfig constructor. Pass in an MpiConfiguration, PyTorchConfiguration, or TensorflowConfiguration for distributed jobs of the respective types.

The following example configures a PyTorch training job to use distributed training with MPI/Horovod:

from azureml.core.runconfig import MpiConfiguration

src = ScriptRunConfig(source_directory='.',
                      distributed_job_config=MpiConfiguration(node_count=2, process_count_per_node=2))

For more information, see:


If you need to access the underlying RunConfiguration object for a ScriptRunConfig for any reason, you can do so as follows:


Next steps