Share via

Module Not Found Error when launching parameter study

KOz 1 Reputation point
2022-03-12T23:02:34.127+00:00

Hello,

I am a new user to Azure ML, and I would like to use the service to perform a parameter study for a ML model. I was able to launch a single job to test one parameter (e.g. learning rate = 0.01), but I am having trouble launching multiple jobs to cover several parameters (e.g. learning rates = 0.1, 0.01, or 0.001).

I generally followed the hyperparameter tuning guide (https://learn.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters), but when I run the code below, the jobs fail with the error "User program failed with ModuleNotFoundError: No module named 'sklearn'". Can someone help me identify what I am doing incorrectly? I tried to add the conda dependency (as shown) to fix this error, but it still did not work.

Thank you!

from azureml.core import Workspace  
from azureml.core import Experiment   
from azureml.core import Environment  
from azureml.core import ScriptRunConfig  
from azureml.core.environment import CondaDependencies  
from azureml.train.hyperdrive import HyperDriveConfig  
from azureml.train.hyperdrive import choice  
from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, uniform, PrimaryMetricGoal  
from azureml.core.compute import ComputeTarget  
  
  
ws = Workspace.from_config()  
env = Environment.get(workspace=ws, name="AzureML-tensorflow-2.5-ubuntu20.04-py38-cuda11-gpu")  
curated_clone1 = env.clone("customize_curated")  
conda_dep = CondaDependencies().add_conda_package("scikit-learn")  
curated_clone1.python.conda_dependencies=conda_dep  
  
  
curated_clone1.register(ws)  
myvm = ComputeTarget(workspace=ws, name='cpu3')  
param_sampling = RandomParameterSampling( {  
        'learning_rate': choice(0.001, 0.0001, 0.00001),  
          
    }  
)  
  
early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=10)  
  
src = ScriptRunConfig(source_directory='./', script='loadv1.py', compute_target = myvm, environment=curated_clone1)  
src.run_config.target = myvm  
hd_config = HyperDriveConfig(run_config=src,  
                             hyperparameter_sampling=param_sampling,  
                             policy=early_termination_policy,  
                             primary_metric_name="loss",  
                             primary_metric_goal=PrimaryMetricGoal.MINIMIZE,  
                             max_total_runs=100,  
                             max_concurrent_runs=4)  
  
  
experiment = Experiment(workspace=ws, name='day2-experiment-data')  
#run = experiment.submit(src)  
hyperdrive_run = experiment.submit(hd_config)  
Azure Machine Learning

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.