Local compute not found error when running a hyperparameter search

SydneyD 1 Reputation point
2022-03-14T16:00:14.647+00:00

I am new to Azure and am trying to run a hyperparameter search on my neural network. I can run my code fine when I'm submitting a single job to examine a parameter, but when I run a hyperparameter search with the same configurations I get the following error:

"ComputeTargetNotFound: Compute Target with name local not found in provided workspace"

Any help would be appreciated!

 from azureml.core import Workspace
    from azureml.core import Experiment 
    from azureml.core import Environment
    from azureml.core import ScriptRunConfig
    from azureml.core.environment import CondaDependencies
    from azureml.train.hyperdrive import HyperDriveConfig
    from azureml.train.hyperdrive import choice
    from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, uniform, PrimaryMetricGoal
    from azureml.core.compute import ComputeTarget


    ws = Workspace.from_config()
    env = Environment.get(workspace=ws, name="AzureML-tensorflow-2.5-ubuntu20.04-py38-cuda11-gpu")
    curated_clone1 = env.clone("customize_curated")
    conda_dep = CondaDependencies().add_conda_package("scikit-learn")
    curated_clone1.python.conda_dependencies=conda_dep


    curated_clone1.register(ws)

    param_sampling = RandomParameterSampling( {
            'learning_rate': choice(0.001, 0.0001, 0.00001),

        }
    )

    early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=10)

    src = ScriptRunConfig(source_directory='./', script='loadv1.py',  environment=curated_clone1)

    hd_config = HyperDriveConfig(run_config=src,
                                 hyperparameter_sampling=param_sampling,
                                 policy=early_termination_policy,
                                 primary_metric_name="loss",
                                 primary_metric_goal=PrimaryMetricGoal.MINIMIZE,
                                 max_total_runs=100,
                                 max_concurrent_runs=4)


    experiment = Experiment(workspace=ws, name='day3-experiment-data')
    #run = experiment.submit(src)
    hyperdrive_run = experiment.submit(hd_config)
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,602 questions
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 42,761 Reputation points Microsoft Employee
    2022-03-15T07:29:21.147+00:00

    @SydneyD The experiment setup mentioned seems similar to one of the threads I recently commented on.
    The error in that case was with the module sklearn, not sure if the suggested steps worked for the user to fix the error with the module.

    In this case and the other thread the compute target is still not defined. The compute target needs to be defined in your ScriptRunConfig() for example,

    from azureml.core.compute import ComputeTarget, AmlCompute  
    from azureml.core.compute_target import ComputeTargetException  
      
    # choose a name for your cluster  
    cluster_name = "hd-cluster"  
      
    try:  
        compute_target = ComputeTarget(workspace=ws, name=cluster_name)  
        print('Found existing compute target.')  
    except ComputeTargetException:  
        print('Creating a new compute target...')  
        compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',   
                                                               max_nodes=4)  
      
        # create the cluster  
        compute_target = ComputeTarget.create(ws, cluster_name, compute_config)  
      
        compute_target.wait_for_completion(show_output=True)  
      
    # use get_status() to get a detailed status for the current cluster.   
       print(compute_target.get_status().serialize())  
      
       src = ScriptRunConfig(source_directory='./', script='loadv1.py',compute_target=compute_target, environment=curated_clone1)  
    

    The default compute target is local on the ScriptRunConfig() and since it is not created in your workspace this error is seen since you are submiting this experiment to the workspace.

    The compute target where training will happen. This can either be a ComputeTarget object, the name of an existing ComputeTarget, or the string "local". If no compute target is specified, your local machine will be used.

    0 comments No comments