Guidelines for deploying MLflow models

APPLIES TO: Azure CLI ml extension v2 (current)

In this article, learn about deployment of MLflow models to Azure Machine Learning for both real-time and batch inference, and about different tools you can use to manage the deployments.

No-code deployment

When you deploy MLflow models to Azure Machine Learning, unlike with custom model deployment, you don't have to provide a scoring script or an environment. Azure Machine Learning automatically generates the scoring script and environment for you. This functionality is called no-code deployment.

For no-code deployment, Azure Machine Learning:

  • Ensures that all the package dependencies indicated in the MLflow model are satisfied.
  • Provides an MLflow base image or curated environment that contains the following items:
    • Packages required for Azure Machine Learning to perform inference, including mlflow-skinny.
    • A scoring script to perform inference.

Tip

Workspaces without public network access: Before you can deploy MLflow models to online endpoints without egress connectivity, you have to package the models (preview). By using model packaging, you can avoid the need for an internet connection, which Azure Machine Learning would otherwise require to dynamically install necessary Python packages for the MLflow models.

Packages and dependencies

Azure Machine Learning automatically generates environments to run inference on MLflow models. To build the environments, Azure Machine Learning reads the conda dependencies that are specified in the MLflow model and adds any packages that are required to run the inferencing server. These extra packages vary depending on deployment type.

The following example conda.yaml file shows conda dependencies specified in an MLflow model.

channels:
- conda-forge
dependencies:
- python=3.10.11
- pip<=23.1.2
- pip:
  - mlflow==2.7.1
  - cloudpickle==1.6.0
  - dataclasses==0.6
  - lz4==4.0.0
  - numpy==1.23.5
  - packaging==23.0
  - psutil==5.9.0
  - pyyaml==6.0
  - scikit-learn==1.1.2
  - scipy==1.10.1
  - uuid==1.30
name: mlflow-env

Important

MLflow automatically detects packages when it logs a model, and it pins the package versions in the model's conda dependencies. This automatic package detection might not reflect your intentions or requirements. You can alternatively log models with a custom signature, environment or samples.

Models with signatures

MLflow models can include a signature that indicates the expected inputs and their types. When such models are deployed to online or batch endpoints, Azure Machine Learning ensures that the number and types of the data inputs comply with the signature. If the input data can't be parsed as expected, the model invocation fails.

You can inspect an MLflow model signature by opening the MLmodel file. For more information on how signatures work in MLflow, see Signatures in MLflow.

The following example MLmodel file highlights the signature.

artifact_path: model
flavors:
  python_function:
    env:
      conda: conda.yaml
      virtualenv: python_env.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    predict_fn: predict
    python_version: 3.10.11
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 1.1.2
mlflow_version: 2.7.1
model_uuid: 3f725f3264314c02808dd99d5e5b2781
run_id: 70f15bab-cf98-48f1-a2ea-9ad2108c28cd
signature:
  inputs: '[{"name": "age", "type": "double"}, {"name": "sex", "type": "double"},
    {"name": "bmi", "type": "double"}, {"name": "bp", "type": "double"}, {"name":
    "s1", "type": "double"}, {"name": "s2", "type": "double"}, {"name": "s3", "type":
    "double"}, {"name": "s4", "type": "double"}, {"name": "s5", "type": "double"},
    {"name": "s6", "type": "double"}]'
  outputs: '[{"type": "double"}]'

Tip

Signatures in MLflow models are recommended because they provide a convenient way to detect data compatibility issues. For more information about how to log models with signatures, see Logging models with a custom signature, environment or samples.

Deployment in the MLflow built-in server vs. deployment in Azure Machine Learning inferencing server

Model developers can use MLflow built-in deployment tools to test models locally. For instance, you can run a local instance of a model that's registered in the MLflow server registry by using mlflow models serve or the MLflow CLI mlflow models predict. For more information about MLflow built-in deployment tools, see Built-in deployment tools in the MLflow documentation.

Azure Machine Learning also supports deploying models to both online and batch endpoints. These endpoints run different inferencing technologies that can have different features.

  • Azure Machine Learning online endpoints, similar to the MLflow built-in server, provide a scalable, synchronous, and lightweight way to run models for inference.

  • Azure Machine Learning batch endpoints can run asynchronous inference over long-running inferencing processes that can scale to large amounts of data. The MLflow server lacks this capability, although you can achieve a similar capability by using Spark jobs. To learn more about batch endpoints and MLflow models, see Use MLflow models in batch deployments.

Input formats

The following table shows the input types supported by the MLflow built-in server versus Azure Machine Learning online endpoints.

Input type MLflow built-in server Azure Machine Learning online endpoint
JSON-serialized pandas DataFrames in the split orientation
JSON-serialized pandas DataFrames in the records orientation Deprecated
CSV-serialized pandas DataFrames Use batch inferencing. For more information, see Deploy MLflow models to batch endpoints.
TensorFlow input as JSON-serialized lists (tensors) and dictionary of lists (named tensors)
TensorFlow input using the TensorFlow Serving API

The following sections focus on MLflow models that are deployed to Azure Machine Learning online endpoints.

Input structure

Regardless of input type, Azure Machine Learning requires you to provide inputs in a JSON payload in the dictionary key input_data. This key isn't required when you use the command mlflow models serve to serve models, so payloads can't be used interchangeably for Azure Machine Learning online endpoints and the MLflow built-in server.

Important

The payload structure changed in MLflow 2.0.

The following payload examples show differences between a model deployed in the MLflow built-in server versus the Azure Machine Learning inferencing server.

JSON-serialized pandas DataFrame in the split orientation

{
    "input_data": {
        "columns": [
            "age", "sex", "trestbps", "chol", "fbs", "restecg", "thalach", "exang", "oldpeak", "slope", "ca", "thal"
        ],
        "index": [1],
        "data": [
            [1, 1, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
        ]
    }
}

Tensor input

{
    "input_data": [
          [1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2],
          [1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
          [1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2],
          [1, 1, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
    ]
}

Named-tensor input

{
    "input_data": {
        "tokens": [
          [0, 655, 85, 5, 23, 84, 23, 52, 856, 5, 23, 1]
        ],
        "mask": [
          [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
        ]
    }
}

Inference customization for MLflow models

Scoring scripts customize how to execute inferencing for custom models. But for MLflow model deployment, the decision about how to execute inferencing is made by the model builder rather than by the deployment engineer. Each model framework can automatically apply specific inference routines.

If you need to change how inference is executed for an MLflow model, you can do one of the following things:

  • Change how your model is being logged in the training routine.
  • Customize inference with a scoring script at deployment time.

Change how your model is logged during training

When you log a model by using either mlflow.autolog or mlflow.<flavor>.log_model, the flavor used for the model determines how to execute inference and what results to return. MLflow doesn't enforce any specific behavior for how the predict() function generates results.

In some cases, you might want to do some preprocessing or postprocessing before and after your model executes. Or, you might want to change what is returned; for example, probabilities instead of classes. One solution is to implement machine learning pipelines that move from inputs to outputs directly.

For example, sklearn.pipeline.Pipeline or pyspark.ml.Pipeline are popular ways to implement pipelines, and are sometimes recommended for performance reasons. You can also customize how your model does inferencing by logging custom models.

Customize inference with a scoring script

Although MLflow models don't require a scoring script, you can still provide one to customize inference execution for MLflow models if needed. For more information on how to customize inference, see Customize MLflow model deployments for online endpoints or Customize model deployment with scoring script for batch endpoints.

Important

If you choose to specify a scoring script for an MLflow model deployment, you also need to provide an environment for the deployment.

Deployment tools

Azure Machine Learning offers the following tools to deploy MLflow models to online and batch endpoints:

Each tool has different capabilities, particularly for which type of compute it can target. The following table shows the support for different MLflow deployment scenarios.

Scenario MLflow SDK Azure Machine Learning CLI/SDK or studio
Deploy to managed online endpoints1 Supported. See Progressive rollout of MLflow models to online endpoints Supported. See Deploy MLflow models to online endpoints
Deploy to managed online endpoints with a scoring script Not supported3 Supported. See Customize MLflow model deployments
Deploy to batch endpoints Not supported3 Supported. See Use MLflow models in batch deployments
Deploy to batch endpoints with a scoring script Not supported3 Supported. See Customize model deployment with scoring script
Deploy to web services like Azure Container Instances or Azure Kubernetes Service (AKS) Legacy support2 Not supported2
Deploy to web services like Container Instances or AKS with a scoring script Not supported3 Legacy support2

1 Deployment to online endpoints that are in workspaces with private link enabled requires you to package models before deployment (preview).

2 Switch to managed online endpoints if possible.

3 Open-source MLflow doesn't have the concept of a scoring script and doesn't support batch execution.

Choose a deployment tool

Use the MLflow SDK if:

  • You're familiar with MLflow and want to continue using the same methods, and
  • You're using a platform like Azure Databricks that supports MLflow natively.

Use the Azure Machine Learning CLI v2 or SDK for Python if:

  • You're familiar with them, or
  • You want to automate deployment with pipelines, or
  • You want to keep deployment configuration in a Git repository.

Use the Azure Machine Learning studio UI if you want to quickly deploy and test models trained with MLflow.