Logging MLflow models

The following article explains how to start logging your trained models (or artifacts) as MLflow models. It explores the different methods to customize the way MLflow packages your models and hence how it runs them.

Why logging models instead of artifacts?

If you are not familiar with MLflow, you may not be aware of the difference between logging artifacts or files vs. logging MLflow models. We recommend reading the article From artifacts to models in MLflow for an introduction to the topic.

A model in MLflow is also an artifact, but with a specific structure that serves as a contract between the person that created the model and the person that intends to use it. Such contract helps build the bridge about the artifacts themselves and what they mean.

Logging models has the following advantages:

  • Models can be directly loaded for inference using mlflow.<flavor>.load_model and use the predict function.
  • Models can be used as pipelines inputs directly.
  • Models can be deployed without indicating a scoring script nor an environment.
  • Swagger is enabled in deployed endpoints automatically and the Test feature can be used in Azure ML studio.
  • You can use the Responsible AI dashboard.

There are different ways to start using the model's concept in Azure Machine Learning with MLflow, as explained in the following sections:

Logging models using autolog

One of the simplest ways to start using this approach is by using MLflow autolog functionality. Autolog allows MLflow to instruct the framework associated to with the framework you are using to log all the metrics, parameters, artifacts and models that the framework considers relevant. By default, most models will be log if autolog is enabled. Some flavors may decide not to do that in specific situations. For instance, the flavor PySpark won't log models if they exceed a certain size.

You can turn on autologging by using either mlflow.autolog() or mlflow.<flavor>.autolog(). The following example uses autolog() for logging a classifier model trained with XGBoost:

import mlflow
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score


model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")
model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)


If you are using Machine Learning pipelines, like for instance Scikit-Learn pipelines, use the autolog functionality of that flavor for logging models. Models are automatically logged when the fit() method is called on the pipeline object. The notebook Training and tracking an XGBoost classifier with MLflow demonstrates how to log a model with preprocessing using pipelines.

Logging models with a custom signature, environment or samples

You can log models manually using the method mlflow.<flavor>.log_model in MLflow. Such workflow has the advantages of retaining control of different aspects of how the model is logged.

Use this method when:

  • You want to indicate pip packages or a conda environment different from the ones that are automatically detected.
  • You want to include input examples.
  • You want to include specific artifacts into the package that will be needed.
  • Your signature is not correctly inferred by autolog. This is specifically important when you deal with inputs that are tensors where the signature needs specific shapes.
  • Somehow the default behavior of autolog doesn't fill your purpose.

The following example code logs a model for an XGBoost classifier:

import mlflow
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
from mlflow.models import infer_signature
from mlflow.utils.environment import _mlflow_conda_env


model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")
model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)
y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

# Signature
signature = infer_signature(X_test, y_test)

# Conda environment
custom_env =_mlflow_conda_env(

# Sample
input_example = X_train.sample(n=1)

# Log the model manually


  • log_models=False is configured in autolog. This prevents MLflow to automatically log the model, as it is done manually later.
  • infer_signature is a convenient method to try to infer the signature directly from inputs and outputs.
  • mlflow.utils.environment._mlflow_conda_env is a private method in MLflow SDK and it may change in the future. This example uses it just for sake of simplicity, but it must be used with caution or generate the YAML definition manually as a Python dictionary.

Logging models with a different behavior in the predict method

When you log a model using either mlflow.autolog or using mlflow.<flavor>.log_model, the flavor used for the model decides how inference should be executed and what gets returned by the model. MLflow doesn't enforce any specific behavior in how the predict generate results. There are scenarios where you probably want to do some pre-processing or post-processing before and after your model is executed.

A solution to this scenario is to implement machine learning pipelines that moves from inputs to outputs directly. Although this is possible (and sometimes encourageable for performance considerations), it may be challenging to achieve. For those cases, you probably want to customize how your model does inference using a custom models as explained in the following section.

Logging custom models

MLflow provides support for a variety of machine learning frameworks including FastAI, MXNet Gluon, PyTorch, TensorFlow, XGBoost, CatBoost, h2o, Keras, LightGBM, MLeap, ONNX, Prophet, spaCy, Spark MLLib, Scikit-Learn, and statsmodels. However, there may be times where you need to change how a flavor works, log a model not natively supported by MLflow or even log a model that uses multiple elements from different frameworks. For those cases, you may need to create a custom model flavor.

For this type of models, MLflow introduces a flavor called pyfunc (standing from Python function). Basically this flavor allows you to log any object you want as a model, as long as it satisfies two conditions:

  • You implement the method predict (at least).
  • The Python object inherits from mlflow.pyfunc.PythonModel.


Serializable models that implements the Scikit-learn API can use the Scikit-learn flavor to log the model, regardless of whether the model was built with Scikit-learn. If your model can be persisted in Pickle format and the object has methods predict() and predict_proba() (at least), then you can use mlflow.sklearn.log_model() to log it inside a MLflow run.

The simplest way of creating your custom model's flavor is by creating a wrapper around your existing model object. MLflow will serialize it and package it for you. Python objects are serializable when the object can be stored in the file system as a file (generally in Pickle format). During runtime, the object can be materialized from such file and all the values, properties and methods available when it was saved will be restored.

Use this method when:

  • Your model can be serialized in Pickle format.
  • You want to retain the models state as it was just after training.
  • You want to customize the way the predict function works.

The following sample wraps a model created with XGBoost to make it behaves in a different way to the default implementation of the XGBoost flavor (it returns the probabilities instead of the classes):

from mlflow.pyfunc import PythonModel, PythonModelContext

class ModelWrapper(PythonModel):
    def __init__(self, model):
        self._model = model

    def predict(self, context: PythonModelContext, data):
        # You don't have to keep the semantic meaning of `predict`. You can use here model.recommend(), model.forecast(), etc
        return self._model.predict_proba(data)

    # You can even add extra functions if you need to. Since the model is serialized,
    # all of them will be available when you load your model back.
    def predict_batch(self, data):

Then, a custom model can be logged in the run like this:

import mlflow
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
from mlflow.models import infer_signature


model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")
model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)
y_probs = model.predict_proba(X_test)

accuracy = accuracy_score(y_test, y_probs.argmax(axis=1))
mlflow.log_metric("accuracy", accuracy)

signature = infer_signature(X_test, y_probs)


Note how the infer_signature method now uses y_probs to infer the signature. Our target column has the target class, but our model now returns the two probabilities for each class.

Next steps