Parameters Tuning for Randomforest

Question

Parameters Tuning for Randomforest

Tom-Zhou 1

I followed the samples from DP-100 lab 8A.

https://github.com/MicrosoftLearning/DP100/blob/master/08A%20-%20Tuning%20Hyperparameters.ipynb

I tried to parameter tunning for randomforest regressor on Boston Data.

However, the code is running. I am not able to get metric and the output of the result.

What is the problem.

%%writefile $experiment_folder/diabetes_training.py
# Import libraries
import argparse
import joblib
import os
from azureml.core import Run
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve

# Set regularization parameter
parser = argparse.ArgumentParser()
parser.add_argument(&#39;--regularization&#39;, type=float, dest=&#39;reg_rate&#39;, default=0.01, help=&#39;regularization rate&#39;)
args = parser.parse_args()
reg = args.reg_rate

# Get the experiment run context
run = Run.get_context()

# load the diabetes dataset
print(&#34;Loading Data...&#34;)
diabetes = run.input_datasets[&#39;diabetes&#39;].to_pandas_dataframe() # Get the training data from the estimator input

# Separate features and labels
X, y = diabetes[[&#39;Pregnancies&#39;,&#39;PlasmaGlucose&#39;,&#39;DiastolicBloodPressure&#39;,&#39;TricepsThickness&#39;,&#39;SerumInsulin&#39;,&#39;BMI&#39;,&#39;DiabetesPedigree&#39;,&#39;Age&#39;]].values, diabetes[&#39;Diabetic&#39;].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Train a logistic regression model
print(&#39;Training a logistic regression model with regularization rate of&#39;, reg)
run.log(&#39;Regularization Rate&#39;,  np.float(reg))
model = LogisticRegression(C=1/reg, solver=&#34;liblinear&#34;).fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print(&#39;Accuracy:&#39;, acc)
run.log(&#39;Accuracy&#39;, np.float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print(&#39;AUC: &#39; + str(auc))
run.log(&#39;AUC&#39;, np.float(auc))

os.makedirs(&#39;outputs&#39;, exist_ok=True)
# note file saved in the outputs folder is automatically uploaded into experiment record
joblib.dump(value=model, filename=&#39;outputs/diabetes_model.pkl&#39;)

run.complete()


from azureml.core import Experiment
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive import GridParameterSampling, MedianStoppingPolicy, HyperDriveConfig, PrimaryMetricGoal, choice, normal
from azureml.widgets import RunDetails


# Sample a range of parameter values
params = GridParameterSampling(
    {
        # Tuning the Parameters

        &#39;--max_depth&#39;:choice(70,100,130,160)
    }
)


# Get the training dataset
boston_ds = ws.datasets.get(&#34;boston dataset&#34;)

# Create an estimator that uses the remote compute
hyper_estimator = SKLearn(source_directory=experiment_folder,
                          inputs=[boston_ds.as_named_input(&#39;boston&#39;)], # Pass the dataset as an input...
                          pip_packages=[&#39;azureml-sdk&#39;], # ...so we need azureml-dataprep (it&#39;s in the SDK!)
                          entry_script=&#39;train_boston.py&#39;,
                          compute_target = training_cluster,)


#early_termination_policy = MedianStoppingPolicy(evaluation_interval=1, delay_evaluation=5)


# Configure hyperdrive settings
hyperdrive = HyperDriveConfig(estimator=hyper_estimator, 
                          hyperparameter_sampling=params, 
                          policy=None, 
                          primary_metric_name=&#39;MAE&#39;, 
                          primary_metric_goal= PrimaryMetricGoal.MINIMIZE, 
                          max_total_runs=6,
                          max_concurrent_runs=4)

# Run the experiment
experiment = Experiment(workspace = ws, name = &#39;boston_training_hyperdrive&#39;)
run = experiment.submit(config=hyperdrive)

# Show the status in the notebook as the experiment runs
RunDetails(run).show()
run.wait_for_completion()

Ramr-msft 17,826 Reputation points

2020-10-29T17:29:34.753+00:00

@Tom-Zhou Thanks for the question. Full sample available at https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/ .Can you please add more details about the error that you are getting.
Tom-Zhou 1 Reputation point

2020-10-29T21:27:14.03+00:00

Hi, the error I am getting is that. the each experiment run, I am not able to get the metric that I set. I can't access the link you shared. It showed 404 not found.

1 answer

Your answer

Ramr-msft 17,826 Reputation points

2020-10-29T17:29:34.753+00:00

@Tom-Zhou Thanks for the question. Full sample available at https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/ .Can you please add more details about the error that you are getting.
Tom-Zhou 1 Reputation point

2020-10-29T21:27:14.03+00:00

Hi, the error I am getting is that. the each experiment run, I am not able to get the metric that I set. I can't access the link you shared. It showed 404 not found.

Answer 1

Ramr-msft 17,826

@Tom-Zhou Thanks for the details, Here are the azure ml samples..

Please follow the below doc for azure machine learning.
https://learn.microsoft.com/en-us/azure/machine-learning/

Share via

Parameters Tuning for Randomforest

1 answer

Your answer