Using Estimator or SciptRunConfig for Pipeline with Hyperdrive and XGBoost?

Michael Søegaard 111 Reputation points
2021-01-28T08:07:44.403+00:00

I'm learning Azure ML and I'm trying to make a pipeline with HyperDrive and an Xgboost estimator. But I cannot figure out how I add the custom XGBoost environment to my HyperDriveConfig.

Estimator is apparently deprecated and telling me to use ScriptRunConfig instead.

So I have created an Enviroment pointing to a yaml file with dependencies and a ScriptRunConfig pointing to the environment. But how should I use the ScriptRunConfig in HyperDriveConfig?
This is the code I'm trying atm:

env = Environment.from_conda_specification("xgboost", "environment.yml")

src = ScriptRunConfig(
                    source_directory='.',
                    script='train.py',
                    compute_target=compute_target,
                    environment=env,
)

hyperdrive_run_config = HyperDriveConfig(
                                         hyperparameter_sampling=ps, 
                                         primary_metric_name='Accuracy',
                                         primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                         max_total_runs=100,
                                         max_concurrent_runs=4,
                                        #run_config = aml_run_config,
                                        policy=None,
                                        estimator=src
                                        )

When I submit the pipeline get the following error:

AttributeError: 'ScriptRunConfig' object has no attribute '_get_script_run_config'

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,024 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 47,531 Reputation points Microsoft Employee
    2021-01-28T11:34:05.093+00:00

    @Michael Søegaard I think the ScriptRunConfig() should be passed to the run_config parameter instead of the estimator. Here is the sample

    from azureml.train.hyperdrive import HyperDriveConfig  
    hd_config = HyperDriveConfig(run_config=src,  
                                 hyperparameter_sampling=param_sampling,  
                                 policy=early_termination_policy,  
                                 primary_metric_name="accuracy",  
                                 primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,  
                                 max_total_runs=100,  
                                 max_concurrent_runs=4)  
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.