Models in Unity Catalog example
This example illustrates how to use Models in Unity Catalog to build a machine learning application that forecasts the daily power output of a wind farm. The example shows how to:
- Track and log models with MLflow.
- Register models to Unity Catalog.
- Describe models and deploy them for inference using aliases.
- Integrate registered models with production applications.
- Search and discover models in Unity Catalog.
- Delete models.
The article describes how to perform these steps using the MLflow Tracking and Models in Unity Catalog UIs and APIs.
Requirements
Make sure you meet all the requirements in Requirements. In addition, the code examples in this article assume that you have the following privileges:
USE CATALOG
privilege on themain
catalog.CREATE MODEL
andUSE SCHEMA
privileges on themain.default
schema.
Notebook
All of the code in this article is provided in the following notebook.
Models in Unity Catalog example notebook
Install MLflow Python client
This example requires the MLflow Python client version 2.5.0 or above and TensorFlow. Add the following commands at the top of your notebook to install these dependencies.
%pip install --upgrade "mlflow-skinny[databricks]>=2.5.0" tensorflow
dbutils.library.restartPython()
Load dataset, train model, and register to Unity Catalog
This section shows how to load the wind farm dataset, train a model, and register the model to Unity Catalog. The model training run and metrics are tracked in an experiment run.
Load dataset
The following code loads a dataset containing weather data and power output information for a wind farm in the United States. The dataset contains wind direction
, wind speed
, and air temperature
features sampled every six hours (once at 00:00
, once at 08:00
, and once at 16:00
), as well as daily aggregate power output (power
), over several years.
import pandas as pd
wind_farm_data = pd.read_csv("https://github.com/dbczumar/model-registry-demo-notebook/raw/master/dataset/windfarm_data.csv", index_col=0)
def get_training_data():
training_data = pd.DataFrame(wind_farm_data["2014-01-01":"2018-01-01"])
X = training_data.drop(columns="power")
y = training_data["power"]
return X, y
def get_validation_data():
validation_data = pd.DataFrame(wind_farm_data["2018-01-01":"2019-01-01"])
X = validation_data.drop(columns="power")
y = validation_data["power"]
return X, y
def get_weather_and_forecast():
format_date = lambda pd_date : pd_date.date().strftime("%Y-%m-%d")
today = pd.Timestamp('today').normalize()
week_ago = today - pd.Timedelta(days=5)
week_later = today + pd.Timedelta(days=5)
past_power_output = pd.DataFrame(wind_farm_data)[format_date(week_ago):format_date(today)]
weather_and_forecast = pd.DataFrame(wind_farm_data)[format_date(week_ago):format_date(week_later)]
if len(weather_and_forecast) < 10:
past_power_output = pd.DataFrame(wind_farm_data).iloc[-10:-5]
weather_and_forecast = pd.DataFrame(wind_farm_data).iloc[-10:]
return weather_and_forecast.drop(columns="power"), past_power_output["power"]
Configure MLflow client to access models in Unity Catalog
By default, the MLflow Python client creates models in the workspace model registry on Azure Databricks. To upgrade to models in Unity Catalog, configure the client to access models in Unity Catalog:
import mlflow
mlflow.set_registry_uri("databricks-uc")
Train and register model
The following code trains a neural network using TensorFlow Keras to predict power output based on the weather features in the dataset and uses MLflow APIs to register the fitted model to Unity Catalog.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
MODEL_NAME = "main.default.wind_forecasting"
def train_and_register_keras_model(X, y):
with mlflow.start_run():
model = Sequential()
model.add(Dense(100, input_shape=(X.shape[-1],), activation="relu", name="hidden_layer"))
model.add(Dense(1))
model.compile(loss="mse", optimizer="adam")
model.fit(X, y, epochs=100, batch_size=64, validation_split=.2)
example_input = X[:10].to_numpy()
mlflow.tensorflow.log_model(
model,
artifact_path="model",
input_example=example_input,
registered_model_name=MODEL_NAME
)
return model
X_train, y_train = get_training_data()
model = train_and_register_keras_model(X_train, y_train)
View the model in the UI
You can view and manage registered models and model versions in Unity Catalog using the Catalog Explorer. Look for the model you just created under the main
catalog and default
schema.
Deploy a model version for inference
Models in Unity Catalog support aliases for model deployment. Aliases provide mutable, named references (for example, “Champion” or “Challenger”) to a particular version of a registered model. You can reference and target model versions using these aliases in downstream inference workflows.
Once you’ve navigated to the registered model in Catalog Explorer, click under the Aliases column to assign the “Champion” alias to the latest model version, and press “Continue” to save changes.
Load model versions using the API
The MLflow Models component defines functions for loading models from several machine learning frameworks. For example, mlflow.tensorflow.load_model()
is used to load TensorFlow models that were saved in MLflow format, and mlflow.sklearn.load_model()
is used to load scikit-learn models that were saved in MLflow format.
These functions can load models from Models in Unity Catalog.
import mlflow.pyfunc
model_version_uri = "models:/{model_name}/1".format(model_name=MODEL_NAME)
print("Loading registered model version from URI: '{model_uri}'".format(model_uri=model_version_uri))
model_version_1 = mlflow.pyfunc.load_model(model_version_uri)
model_champion_uri = "models:/{model_name}@Champion".format(model_name=MODEL_NAME)
print("Loading registered model version from URI: '{model_uri}'".format(model_uri=model_champion_uri))
champion_model = mlflow.pyfunc.load_model(model_champion_uri)
Forecast power output with the champion model
In this section, the champion model is used to evaluate weather forecast data for the wind farm. The forecast_power()
application loads the latest version of the forecasting model from the specified stage and uses it to forecast power production over the next five days.
from mlflow.tracking import MlflowClient
def plot(model_name, model_alias, model_version, power_predictions, past_power_output):
import matplotlib.dates as mdates
from matplotlib import pyplot as plt
index = power_predictions.index
fig = plt.figure(figsize=(11, 7))
ax = fig.add_subplot(111)
ax.set_xlabel("Date", size=20, labelpad=20)
ax.set_ylabel("Power\noutput\n(MW)", size=20, labelpad=60, rotation=0)
ax.tick_params(axis='both', which='major', labelsize=17)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d'))
ax.plot(index[:len(past_power_output)], past_power_output, label="True", color="red", alpha=0.5, linewidth=4)
ax.plot(index, power_predictions.squeeze(), "--", label="Predicted by '%s'\nwith alias '%s' (Version %d)" % (model_name, model_alias, model_version), color="blue", linewidth=3)
ax.set_ylim(ymin=0, ymax=max(3500, int(max(power_predictions.values) * 1.3)))
ax.legend(fontsize=14)
plt.title("Wind farm power output and projections", size=24, pad=20)
plt.tight_layout()
display(plt.show())
def forecast_power(model_name, model_alias):
import pandas as pd
client = MlflowClient()
model_version = client.get_model_version_by_alias(model_name, model_alias).version
model_uri = "models:/{model_name}@{model_alias}".format(model_name=MODEL_NAME, model_alias=model_alias)
model = mlflow.pyfunc.load_model(model_uri)
weather_data, past_power_output = get_weather_and_forecast()
power_predictions = pd.DataFrame(model.predict(weather_data))
power_predictions.index = pd.to_datetime(weather_data.index)
print(power_predictions)
plot(model_name, model_alias, int(model_version), power_predictions, past_power_output)
forecast_power(MODEL_NAME, "Champion")
Add model and model version descriptions using the API
The code in this section shows how you can add model and model version descriptions using the MLflow API.
client = MlflowClient()
client.update_registered_model(
name=MODEL_NAME,
description="This model forecasts the power output of a wind farm based on weather data. The weather data consists of three features: wind speed, wind direction, and air temperature."
)
client.update_model_version(
name=MODEL_NAME,
version=1,
description="This model version was built using TensorFlow Keras. It is a feed-forward neural network with one hidden layer."
)
Create a new model version
Classical machine learning techniques are also effective for power forecasting. The following code trains a random forest model using scikit-learn and registers it to Unity Catalog using the mlflow.sklearn.log_model()
function.
import mlflow.sklearn
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
with mlflow.start_run():
n_estimators = 300
mlflow.log_param("n_estimators", n_estimators)
rand_forest = RandomForestRegressor(n_estimators=n_estimators)
rand_forest.fit(X_train, y_train)
val_x, val_y = get_validation_data()
mse = mean_squared_error(rand_forest.predict(val_x), val_y)
print("Validation MSE: %d" % mse)
mlflow.log_metric("mse", mse)
example_input = val_x.iloc[[0]]
# Specify the `registered_model_name` parameter of the `mlflow.sklearn.log_model()`
# function to register the model to <UC>. This automatically
# creates a new model version
mlflow.sklearn.log_model(
sk_model=rand_forest,
artifact_path="sklearn-model",
input_example=example_input,
registered_model_name=MODEL_NAME
)
Fetch the new model version number
The following code shows how to retrieve the latest model version number for a model name.
client = MlflowClient()
model_version_infos = client.search_model_versions("name = '%s'" % MODEL_NAME)
new_model_version = max([model_version_info.version for model_version_info in model_version_infos])
Add a description to the new model version
client.update_model_version(
name=MODEL_NAME,
version=new_model_version,
description="This model version is a random forest containing 100 decision trees that was trained in scikit-learn."
)
Mark new model version as Challenger and test the model
Before deploying a model to serve production traffic, it is a best practice to test it in on a sample of production data. Previously, you used the “Champion” alias to denote the model version serving the majority of production workloads. The following code assigns the “Challenger” alias to the new model version, and evaluates its performance.
client.set_registered_model_alias(
name=MODEL_NAME,
alias="Challenger",
version=new_model_version
)
forecast_power(MODEL_NAME, "Challenger")
Deploy the new model version as the Champion model version
After verifying that the new model version performs well in tests, the following code assigns the “Champion” alias to the new model version and uses the exact same application code from the Forecast power output with the champion model section to produce a power forecast.
client.set_registered_model_alias(
name=MODEL_NAME,
alias="Champion",
version=new_model_version
)
forecast_power(MODEL_NAME, "Champion")
There are now two model versions of the forecasting model: the model version trained in Keras model and the version trained in scikit-learn. Note that the “Challenger” alias remains assigned to the new scikit-learn model version, so any downstream workloads that target the “Challenger” model version continue to run successfully:
Delete models
When a model version is no longer being used, you can delete it. You can also delete an entire registered model; this removes all associated model versions. Note that deleting a model version clears any aliases assigned to the model version.
Delete Version 1
using the MLflow API
client.delete_model_version(
name=MODEL_NAME,
version=1,
)
Delete the model using the MLflow API
client = MlflowClient()
client.delete_registered_model(name=MODEL_NAME)