Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Optuna is an open-source Python library for hyperparameter tuning that can be scaled horizontally across multiple compute resources.
MLflow 3.0 introduces powerful new capabilities for hyperparameter optimization by integrating with Optuna.
MlflowStorage
class allows Optuna to use the MLflow Tracking Server as its storage backend.MlflowSparkStudy
class enables launching parallel Optuna studies using PySpark executors.
Install Optuna
MLflow 3.0 is pre-installed in Databricks Runtime 17.0 ML and above. On older runtimes, use the following commands to install the latest version of Optuna and MLFlow.
%pip install mlflow --upgrade
%pip install optuna
Run Optuna optimization in parallel
Here are the steps in a Optuna workflow:
Define an objective function to optimize. Within the objective function, define the hyperparameter search space. For more details, see Optuna documentation.
Below is an example for model selection and hyperparameter tuning with sckit-learn. The example defines the objective function
objective
, and calls thesuggest_float
function to define the search space for the parameterx
.
import sklearn
def objective(trial):
# Invoke suggest methods of a Trial object to generate hyperparameters.
regressor_name = trial.suggest_categorical('classifier', ['SVR', 'RandomForest'])
if regressor_name == 'SVR':
svr_c = trial.suggest_float('svr_c', 1e-10, 1e10, log=True)
regressor_obj = sklearn.svm.SVR(C=svr_c)
else:
rf_max_depth = trial.suggest_int('rf_max_depth', 2, 32)
regressor_obj = sklearn.ensemble.RandomForestRegressor(max_depth=rf_max_depth)
X, y = sklearn.datasets.fetch_california_housing(return_X_y=True)
X_train, X_val, y_train, y_val = sklearn.model_selection.train_test_split(X, y, random_state=0)
regressor_obj.fit(X_train, y_train)
y_pred = regressor_obj.predict(X_val)
error = sklearn.metrics.mean_squared_error(y_val, y_pred)
return error # An objective value linked with the Trial object
- Create a shared storage for distributed optimization. With
MlflowStorage
, you can use MLflow Tracking Server as the storage backend.
import mlflow
from mlflow.optuna.storage import MlflowStorage
experiment_id = mlflow.get_experiment_by_name(dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()).experiment_id
mlflow_storage = MlflowStorage(experiment_id=experiment_id)
- Create an Optuna Study object, and run the tuning algorithm by calling the
optimize
function of the Study object.MlflowSparkStudy
can run launching parallel Optuna studies using PySpark executors.
Below is an example from the Optuna documentation.
- Create a Study, and optimize the
objective
function with 8 trials (8 calls of theobjective
function with different values ofx
). - Get the best parameters of the Study
from mlflow.pyspark.optuna.study import MlflowSparkStudy
mlflow_study = MlflowSparkStudy(
study_name="spark-mlflow-tuning",
storage=mlflow_storage,
)
mlflow_study.optimize(objective, n_trials=8, n_jobs=4)
best_params = study.best_params
Notebook example
This notebook provides an example of using Optuna to select a scikit-learn model and a set of hyperparameters for the Iris dataset.
Scaling up hyperparameter tuning with Optuna and MLflow
MLFlow Optuna Integration API
MlflowStorage
MlflowStorage
is a MLflow-based storage class for Optuna with batch processing to avoid REST API throttling.
Class Parameter name | Type | Description |
---|---|---|
experiment_id |
str |
MLflow experiment ID for the storage |
name |
str |
Name of the storage |
batch_flush_interval |
float |
Time in seconds between automatic batch flushes (default: 1.0) |
batch_size_threshold |
float |
Maximum number of items in batch before triggering a flush (default: 100) |
MlflowSparkStudy
MlflowSparkStudy
is a wrapper of the class ~optuna.study.Study
to incorporate Optuna with Spark
via MLflow experiment.
Class Parameter name | Type | Description |
---|---|---|
study_name |
str |
Name of the study |
storage |
mlflow.optuna.MlflowStorage |
MLflow-based storage class |
sampler |
samplers.BaseSampler |
A sampler object that implements background algorithm for value suggestion. optuna.samplers.TPESampler is usedas the default. |
pruner |
float |
A pruner object that decides early stopping of unpromising trials. optuna.pruners.MedianPruner is usedas the default. |