Guidelines for deploying MLflow models

APPLIES TO: Azure CLI ml extension v2 (current)

In this article, learn how to deploy your MLflow model to Azure Machine Learning for both real-time and batch inference. Learn also about the different tools you can use to perform management of the deployment.

Deploying MLflow models vs custom models

When deploying MLflow models to Azure Machine Learning, you don't have to provide a scoring script or an environment for deployment as they're automatically generated for you. We typically refer to this functionality as no-code deployment.

For no-code-deployment, Azure Machine Learning:

  • Ensures all the package dependencies indicated in the MLflow model are satisfied.
  • Provides a MLflow base image/curated environment that contains the following items:
    • Packages required for Azure Machine Learning to perform inference, including mlflow-skinny.
    • A scoring script to perform inference.

Tip

Workspaces without public network access: Before you can deploy MLflow models to online endpoints without egress connectivity, you have to package the models (preview). By using model packaging, you can avoid the need for an internet connection, which Azure Machine Learning would otherwise require to dynamically install necessary Python packages for the MLflow models.

Python packages and dependencies

Azure Machine Learning automatically generates environments to run inference of MLflow models. Those environments are built by reading the conda dependencies specified in the MLflow model. Azure Machine Learning also adds any required package to run the inferencing server, which will vary depending on the type of deployment you're doing.

conda.yaml

channels:
- conda-forge
dependencies:
- python=3.7.11
- pip
- pip:
  - mlflow
  - scikit-learn==0.24.1
  - cloudpickle==2.0.0
  - psutil==5.8.0
name: mlflow-env

Warning

MLflow performs automatic package detection when logging models, and pins their versions in the conda dependencies of the model. However, such action is performed at the best of its knowledge and there might be cases when the detection doesn't reflect your intentions or requirements. On those cases consider logging models with a custom conda dependencies definition.

Implications of models with signatures

MLflow models can include a signature that indicates the expected inputs and their types. For those models containing a signature, Azure Machine Learning enforces compliance with it, both in terms of the number of inputs and their types. This means that your data input should comply with the types indicated in the model signature. If the data can't be parsed as expected, the invocation will fail. This applies for both online and batch endpoints.

MLmodel

artifact_path: model
flavors:
  python_function:
    env: conda.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    python_version: 3.7.11
  sklearn:
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.24.1
run_id: f1e06708-641d-4a49-8f36-e9dcd8d34346
signature:
  inputs: '[{"name": "age", "type": "double"}, {"name": "sex", "type": "double"},
    {"name": "bmi", "type": "double"}, {"name": "bp", "type": "double"}, {"name":
    "s1", "type": "double"}, {"name": "s2", "type": "double"}, {"name": "s3", "type":
    "double"}, {"name": "s4", "type": "double"}, {"name": "s5", "type": "double"},
    {"name": "s6", "type": "double"}]'
  outputs: '[{"type": "double"}]'
utc_time_created: '2022-03-17 01:56:03.706848'

You can inspect your model's signature by opening the MLmodel file associated with your MLflow model. For more information on how signatures work in MLflow, see Signatures in MLflow.

Tip

Signatures in MLflow models are optional but they are highly encouraged as they provide a convenient way to early detect data compatibility issues. For more information about how to log models with signatures read Logging models with a custom signature, environment or samples.

Differences between models deployed in Azure Machine Learning and MLflow built-in server

MLflow includes built-in deployment tools that model developers can use to test models locally. For instance, you can run a local instance of a model registered in MLflow server registry with mlflow models serve -m my_model or you can use the MLflow CLI mlflow models predict. Azure Machine Learning online and batch endpoints run different inferencing technologies, which might have different features. Read this section to understand their differences.

Batch vs online endpoints

Azure Machine Learning supports deploying models to both online and batch endpoints. Online Endpoints compare to MLflow built-in server and they provide a scalable, synchronous, and lightweight way to run models for inference. Batch Endpoints, on the other hand, provide a way to run asynchronous inference over long running inferencing processes that can scale to large amounts of data. This capability isn't present by the moment in MLflow server although similar capability can be achieved using Spark jobs.

The rest of this section mostly applies to online endpoints but you can learn more of batch endpoint and MLflow models at Use MLflow models in batch deployments.

Input formats

Input type MLflow built-in server Azure Machine Learning Online Endpoints
JSON-serialized pandas DataFrames in the split orientation
JSON-serialized pandas DataFrames in the records orientation Deprecated
CSV-serialized pandas DataFrames Use batch1
Tensor input format as JSON-serialized lists (tensors) and dictionary of lists (named tensors)
Tensor input formatted as in TF Serving's API

Note

Input structure

Regardless of the input type used, Azure Machine Learning requires inputs to be provided in a JSON payload, within a dictionary key input_data. The following section shows different payload examples and the differences between MLflow built-in server and Azure Machine Learning inferencing server.

Warning

Note that such key is not required when serving models using the command mlflow models serve and hence payloads can't be used interchangeably.

Important

MLflow 2.0 advisory: Notice that the payload's structure has changed in MLflow 2.0.

Payload example for a JSON-serialized pandas DataFrames in the split orientation

{
    "input_data": {
        "columns": [
            "age", "sex", "trestbps", "chol", "fbs", "restecg", "thalach", "exang", "oldpeak", "slope", "ca", "thal"
        ],
        "index": [1],
        "data": [
            [1, 1, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
        ]
    }
}

Payload example for a tensor input

{
    "input_data": [
          [1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2],
          [1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
          [1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2],
          [1, 1, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
    ]
}

Payload example for a named-tensor input

{
    "input_data": {
        "tokens": [
          [0, 655, 85, 5, 23, 84, 23, 52, 856, 5, 23, 1]
        ],
        "mask": [
          [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
        ]
    }
}

For more information about MLflow built-in deployment tools, see MLflow documentation section.

How to customize inference when deploying MLflow models

You might be used to authoring scoring scripts to customize how inference is executed for your custom models. However, when deploying MLflow models to Azure Machine Learning, the decision about how inference should be executed is done by the model builder (the person who built the model), rather than by the DevOps engineer (the person who is trying to deploy it). Each model framework might automatically apply specific inference routines.

If you need to change the behavior at any point about how inference of an MLflow model is executed, you can either change how your model is being logged in the training routine or customize inference with a scoring script at deployment time.

Change how your model is logged during training

When you log a model using either mlflow.autolog or using mlflow.<flavor>.log_model, the flavor used for the model decides how inference should be executed and what gets returned by the model. MLflow doesn't enforce any specific behavior in how the predict() function generates results. However, there are scenarios where you probably want to do some preprocessing or post-processing before and after your model is executed. On another scenarios, you might want to change what's returned like probabilities vs classes.

A solution to this scenario is to implement machine learning pipelines that moves from inputs to outputs directly. For instance, sklearn.pipeline.Pipeline or pyspark.ml.Pipeline are popular (and sometimes encourageable for performance considerations) ways to do so. Another alternative is to customize how your model does inference using a custom model flavor.

Customize inference with a scoring script

Although MLflow models don't require a scoring script, you can still provide one if needed. You can use it to customize how inference is executed for MLflow models. To learn how to do it, refer to Customizing MLflow model deployments (Online Endpoints) and Customizing MLflow model deployments (Batch Endpoints).

Important

When you opt-in to specify a scoring script for an MLflow model deployment, you also need to provide an environment for it.

Deployment tools

Azure Machine Learning offers many ways to deploy MLflow models to online and batch endpoints. You can deploy models using the following tools:

  • MLflow SDK
  • Azure Machine Learning CLI and Azure Machine Learning SDK for Python
  • Azure Machine Learning studio

Each workflow has different capabilities, particularly around which type of compute they can target. The following table shows them.

Scenario MLflow SDK Azure Machine Learning CLI/SDK Azure Machine Learning studio
Deploy to managed online endpoints See example1 See example1 See example1
Deploy to managed online endpoints (with a scoring script) Not supported3 See example See example
Deploy to batch endpoints Not supported3 See example See example
Deploy to batch endpoints (with a scoring script) Not supported3 See example See example
Deploy to web services (ACI/AKS) Legacy support2 Not supported2 Not supported2
Deploy to web services (ACI/AKS - with a scoring script) Not supported3 Legacy support2 Legacy support2

Note

Which deployment tool to use?

If you're familiar with MLflow or your platform supports MLflow natively (like Azure Databricks), and you wish to continue using the same set of methods, use the MLflow SDK.

However, if you're more familiar with the Azure Machine Learning CLI v2, you want to automate deployments using automation pipelines, or you want to keep deployment configuration in a git repository; we recommend that you use the Azure Machine Learning CLI v2.

If you want to quickly deploy and test models trained with MLflow, you can use the Azure Machine Learning studio UI deployment.

Next steps

To learn more, review these articles: