An Azure machine learning service for building and deploying models.
Module Not Found Error when launching parameter study
Hello,
I am a new user to Azure ML, and I would like to use the service to perform a parameter study for a ML model. I was able to launch a single job to test one parameter (e.g. learning rate = 0.01), but I am having trouble launching multiple jobs to cover several parameters (e.g. learning rates = 0.1, 0.01, or 0.001).
I generally followed the hyperparameter tuning guide (https://learn.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters), but when I run the code below, the jobs fail with the error "User program failed with ModuleNotFoundError: No module named 'sklearn'". Can someone help me identify what I am doing incorrectly? I tried to add the conda dependency (as shown) to fix this error, but it still did not work.
Thank you!
from azureml.core import Workspace
from azureml.core import Experiment
from azureml.core import Environment
from azureml.core import ScriptRunConfig
from azureml.core.environment import CondaDependencies
from azureml.train.hyperdrive import HyperDriveConfig
from azureml.train.hyperdrive import choice
from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, uniform, PrimaryMetricGoal
from azureml.core.compute import ComputeTarget
ws = Workspace.from_config()
env = Environment.get(workspace=ws, name="AzureML-tensorflow-2.5-ubuntu20.04-py38-cuda11-gpu")
curated_clone1 = env.clone("customize_curated")
conda_dep = CondaDependencies().add_conda_package("scikit-learn")
curated_clone1.python.conda_dependencies=conda_dep
curated_clone1.register(ws)
myvm = ComputeTarget(workspace=ws, name='cpu3')
param_sampling = RandomParameterSampling( {
'learning_rate': choice(0.001, 0.0001, 0.00001),
}
)
early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=10)
src = ScriptRunConfig(source_directory='./', script='loadv1.py', compute_target = myvm, environment=curated_clone1)
src.run_config.target = myvm
hd_config = HyperDriveConfig(run_config=src,
hyperparameter_sampling=param_sampling,
policy=early_termination_policy,
primary_metric_name="loss",
primary_metric_goal=PrimaryMetricGoal.MINIMIZE,
max_total_runs=100,
max_concurrent_runs=4)
experiment = Experiment(workspace=ws, name='day2-experiment-data')
#run = experiment.submit(src)
hyperdrive_run = experiment.submit(hd_config)