Module Not Found Error when launching parameter study

Question

Module Not Found Error when launching parameter study

KOz 1

Hello,

I am a new user to Azure ML, and I would like to use the service to perform a parameter study for a ML model. I was able to launch a single job to test one parameter (e.g. learning rate = 0.01), but I am having trouble launching multiple jobs to cover several parameters (e.g. learning rates = 0.1, 0.01, or 0.001).

I generally followed the hyperparameter tuning guide (https://learn.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters), but when I run the code below, the jobs fail with the error "User program failed with ModuleNotFoundError: No module named 'sklearn'". Can someone help me identify what I am doing incorrectly? I tried to add the conda dependency (as shown) to fix this error, but it still did not work.

Thank you!

from azureml.core import Workspace  
from azureml.core import Experiment   
from azureml.core import Environment  
from azureml.core import ScriptRunConfig  
from azureml.core.environment import CondaDependencies  
from azureml.train.hyperdrive import HyperDriveConfig  
from azureml.train.hyperdrive import choice  
from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, uniform, PrimaryMetricGoal  
from azureml.core.compute import ComputeTarget  
  
  
ws = Workspace.from_config()  
env = Environment.get(workspace=ws, name="AzureML-tensorflow-2.5-ubuntu20.04-py38-cuda11-gpu")  
curated_clone1 = env.clone("customize_curated")  
conda_dep = CondaDependencies().add_conda_package("scikit-learn")  
curated_clone1.python.conda_dependencies=conda_dep  
  
  
curated_clone1.register(ws)  
myvm = ComputeTarget(workspace=ws, name='cpu3')  
param_sampling = RandomParameterSampling( {  
        'learning_rate': choice(0.001, 0.0001, 0.00001),  
          
    }  
)  
  
early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=10)  
  
src = ScriptRunConfig(source_directory='./', script='loadv1.py', compute_target = myvm, environment=curated_clone1)  
src.run_config.target = myvm  
hd_config = HyperDriveConfig(run_config=src,  
                             hyperparameter_sampling=param_sampling,  
                             policy=early_termination_policy,  
                             primary_metric_name="loss",  
                             primary_metric_goal=PrimaryMetricGoal.MINIMIZE,  
                             max_total_runs=100,  
                             max_concurrent_runs=4)  
  
  
experiment = Experiment(workspace=ws, name='day2-experiment-data')  
#run = experiment.submit(src)  
hyperdrive_run = experiment.submit(hd_config)

Rohit Mungi 49,131 Reputation points Microsoft Employee Moderator

2022-03-14T07:52:30.453+00:00
@KOz Have you added this line after your first run? I am curious to understand if your first run was successful without adding the conda dependency.

src.run_config.target = myvm

Adding a choice of learning rates shouldn't really throw the error with the module because the same compute was used in your first run with the setup.
I would recommend referring this notebook to check if you have missed any step. I think you could directly use the following curated environment for your setup.

AzureML-sklearn-0.24-ubuntu18.04-py37-cpu