Logging MLflow models

This article describes how to log your trained models (or artifacts) as MLflow models. It explores the different ways to customize how MLflow packages your models, and how it runs those models.

Why logging models instead of artifacts?

From artifacts to models in MLflow describes the difference between logging artifacts or files, as compared to logging MLflow models.

An MLflow model is also an artifact. However, that model has a specific structure that serves as a contract between the person that created the model and the person that intends to use it. This contract helps build a bridge between the artifacts themselves and their meanings.

Model logging has these advantages:

  • You can directly load models, for inference, with mlflow.<flavor>.load_model, and you can use the predict function
  • Pipeline inputs can use models directly
  • You can deploy models without indication of a scoring script or an environment
  • Swagger is automatically enabled in deployed endpoints, and the Azure Machine Learning studio can use the Test feature
  • You can use the Responsible AI dashboard

This section describes how to use the model's concept in Azure Machine Learning with MLflow:

Logging models using autolog

You can use MLflow autolog functionality. Autolog allows MLflow to instruct the framework in use to log all the metrics, parameters, artifacts, and models that the framework considers relevant. By default, if autolog is enabled, most models are logged. In some situations, some flavors might not log a model. For instance, the PySpark flavor doesn't log models that exceed a certain size.

Use either mlflow.autolog() or mlflow.<flavor>.autolog() to activate autologging. This example uses autolog() to log a classifier model trained with XGBoost:

import mlflow
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score


model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")
model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)


If use Machine Learning pipelines, for example Scikit-Learn pipelines, use the autolog functionality of that pipeline flavor to log models. Model logging automatically happens when the fit() method is called on the pipeline object. The Training and tracking an XGBoost classifier with MLflow notebook demonstrates how to log a model with preprocessing, using pipelines.

Logging models with a custom signature, environment or samples

The MLflow mlflow.<flavor>.log_model method can manually log models. This workflow can control different aspects of the model logging.

Use this method when:

  • You want to indicate pip packages or a conda environment that differ from those that are automatically detected
  • You want to include input examples
  • You want to include specific artifacts in the needed package
  • autolog does not correctly infer your signature. This matters when you deal with tensor inputs, where the signature needs specific shapes
  • The autolog behavior does not cover your purpose for some reason

This code example logs a model for an XGBoost classifier:

import mlflow
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
from mlflow.models import infer_signature
from mlflow.utils.environment import _mlflow_conda_env


model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")
model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)
y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

# Signature
signature = infer_signature(X_test, y_test)

# Conda environment
custom_env =_mlflow_conda_env(

# Sample
input_example = X_train.sample(n=1)

# Log the model manually


  • autolog has the log_models=False configuration. This prevents automatic MLflow model logging. Automatic MLflow model logging happens later, as a manual process
  • Use the infer_signature method to try to infer the signature directly from inputs and outputs
  • The mlflow.utils.environment._mlflow_conda_env method is a private method in the MLflow SDK. In this example, it makes the code simpler, but use it with caution. It may change in the future. As an alternative, you can generate the YAML definition manually as a Python dictionary.

Logging models with a different behavior in the predict method

When logging a model with either mlflow.autolog or mlflow.<flavor>.log_model, the model flavor determines how to execute the inference, and what the model returns. MLflow doesn't enforce any specific behavior about the generation of predict results. In some scenarios, you might want to do some preprocessing or post-processing before and after your model executes.

In this situation, implement machine learning pipelines that directly move from inputs to outputs. Although this implementation is possible, and sometimes encouraged to improve performance, it might become challenging to achieve. In those cases, it can help to customize how your model handles inference as explained in next section.

Logging custom models

MLflow supports many machine learning frameworks, including

  • CatBoost
  • FastAI
  • h2o
  • Keras
  • LightGBM
  • MLeap
  • MXNet Gluon
  • ONNX
  • Prophet
  • PyTorch
  • Scikit-Learn
  • spaCy
  • Spark MLLib
  • statsmodels
  • TensorFlow
  • XGBoost

However, you might need to change the way a flavor works, log a model not natively supported by MLflow or even log a model that uses multiple elements from different frameworks. In these cases, you might need to create a custom model flavor.

To solve the problem, MLflow introduces the pyfunc flavor (starting from a Python function). This flavor can log any object as a model, as long as that object satisfies two conditions:

  • You implement the method predict method, at least
  • The Python object inherits from mlflow.pyfunc.PythonModel


Serializable models that implement the Scikit-learn API can use the Scikit-learn flavor to log the model, regardless of whether the model was built with Scikit-learn. If you can persist your model in Pickle format, and the object has the predict() and predict_proba() methods (at least), you can use mlflow.sklearn.log_model() to log the model inside a MLflow run.

If you create a wrapper around your existing model object, it becomes the simplest to create a flavor for your custom model. MLflow serializes and packages it for you. Python objects are serializable when the object can be stored in the file system as a file, generally in Pickle format. At runtime, the object can materialize from that file. This restores all the values, properties, and methods available when it was saved.

Use this method when:

  • You can serialize your model in Pickle format
  • You want to retain the state of the model, as it was just after training
  • You want to customize how the predict function works.

This code sample wraps a model created with XGBoost, to make it behave in a different from the XGBoost flavor default implementation. Instead, it returns the probabilities instead of the classes:

from mlflow.pyfunc import PythonModel, PythonModelContext

class ModelWrapper(PythonModel):
    def __init__(self, model):
        self._model = model

    def predict(self, context: PythonModelContext, data):
        # You don't have to keep the semantic meaning of `predict`. You can use here model.recommend(), model.forecast(), etc
        return self._model.predict_proba(data)

    # You can even add extra functions if you need to. Since the model is serialized,
    # all of them will be available when you load your model back.
    def predict_batch(self, data):

Log a custom model in the run:

import mlflow
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
from mlflow.models import infer_signature


model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")
model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)
y_probs = model.predict_proba(X_test)

accuracy = accuracy_score(y_test, y_probs.argmax(axis=1))
mlflow.log_metric("accuracy", accuracy)

signature = infer_signature(X_test, y_probs)


Here, the infer_signature method uses y_probs to infer the signature. Our target column has the target class, but our model now returns the two probabilities for each class.

Next steps