Use custom Python libraries with Model Serving

Article
03/26/2024

In this article, you learn how to include custom libraries or libraries from a private mirror server when you log your model, so that you can use them with Model Serving model deployments. You should complete the steps detailed in this guide after you have a trained ML model ready to deploy but before you create an Azure Databricks Model Serving endpoint.

Model development often requires the use of custom Python libraries that contain functions for pre- or post-processing, custom model definitions, and other shared utilities. In addition, many enterprise security teams encourage the use of private PyPi mirrors, such as Nexus or Artifactory, to reduce the risk of supply-chain attacks. Azure Databricks offers native support for installation of custom libraries and libraries from a private mirror in the Azure Databricks workspace.

Requirements

MLflow 1.29 or higher

Step 1: Upload dependency file

Databricks recommends that you upload your dependency file to Unity Catalog volumes. Alternatively, you can upload it to Databricks File System (DBFS) using the Azure Databricks UI.

To ensure your library is available to your notebook, you need to install it using %pip%. Using %pip installs the library in the current notebook and downloads the dependency to the cluster.

Step 2: Log the model with a custom library

Important

The guidance in this section is not required if you install the private library by pointing to a custom PyPi mirror.

After you install the library and upload the Python wheel file to either Unity Catalog volumes or DBFS, include the following code in your script. In the extra_pip_requirements specify the path of your dependency file.

mlflow.sklearn.log_model(model, "sklearn-model", extra_pip_requirements=["/volume/path/to/dependency.whl"])

For DBFS, use the following:

mlflow.sklearn.log_model(model, "sklearn-model", extra_pip_requirements=["/dbfs/path/to/dependency.whl"])

If you have a custom library, you must specify all custom Python libraries associated with your model when you configure logging. You can do so with the extra_pip_requirements or conda_env parameters in log_model().

Important

If using DBFS, be sure to include a forward slash, /, before your dbfs path when logging extra_pip_requirements. Learn more about DBFS paths in Work with files on Azure Databricks.

from mlflow.utils.environment import _mlflow_conda_env
conda_env =  _mlflow_conda_env(
            additional_conda_deps= None,
            additional_pip_deps= ["/volumes/path/to/dependency"],
            additional_conda_channels=None,
)
mlflow.pyfunc.log_model(..., conda_env = conda_env)

Step 3: Update MLflow model with Python wheel files

MLflow provides the add_libraries_to_model() utility to log your model with all of its dependencies pre-packaged as Python wheel files. This packages your custom libraries alongside the model in addition to all other libraries that are specified as dependencies of your model. This guarantees that the libraries used by your model are exactly the ones accessible from your training environment.

In the following example, model_uri references the model registry using the syntax models:/<model-name>/<model-version>.

When you use the model registry URI, this utility generates a new version under your existing registered model.

import mlflow.models.utils
mlflow.models.utils.add_libraries_to_model(<model-uri>)

Step 4: Serve your model

When a new model version with the packages included is available in the model registry, you can add this model version to an endpoint with Model Serving.