Guidelines for deploying MLflow models

APPLIES TO: Azure CLI ml extension v2 (current)

In this article, learn how to deploy your MLflow model to Azure Machine Learning for both real-time and batch inference. Learn also about the different tools you can use to perform management of the deployment.

Deploying MLflow models vs custom models

When deploying MLflow models to Azure Machine Learning, you don't have to provide a scoring script or an environment for deployment as they are automatically generated for you. We typically refer to this functionality as no-code deployment.

For no-code-deployment, Azure Machine Learning:

  • Ensures all the package dependencies indicated in the MLflow model are satisfied.
  • Provides a MLflow base image/curated environment that contains the following items:
    • Packages required for Azure Machine Learning to perform inference, including mlflow-skinny.
    • A scoring script to perform inference.

Warning

Online Endpoints dynamically installs Python packages provided MLflow model package during container runtime. deploying MLflow models to online endpoints with no-code deployment in a private network without egress connectivity is not supported by the moment. If that's your case, either enable egress connectivity or indicate the environment to use in the deployment as explained in Customizing MLflow model deployments (Online Endpoints). This limitation is not present in Batch Endpoints.

Python packages and dependencies

Azure Machine Learning automatically generates environments to run inference of MLflow models. Those environments are built by reading the conda dependencies specified in the MLflow model. Azure Machine Learning also adds any required package to run the inferencing server, which will vary depending on the type of deployment you are doing.

conda.yaml

channels:
- conda-forge
dependencies:
- python=3.7.11
- pip
- pip:
  - mlflow
  - scikit-learn==0.24.1
  - cloudpickle==2.0.0
  - psutil==5.8.0
name: mlflow-env

Warning

MLflow performs automatic package detection when logging models, and pins their versions in the conda dependencies of the model. However, such action is performed at the best of its knowledge and there may be cases when the detection doesn't reflect your intentions or requirements. On those cases consider logging models with a custom conda dependencies definition.

Implications of models with signatures

MLflow models can include a signature that indicates the expected inputs and their types. For those models containing a signature, Azure Machine Learning enforces compliance with it, both in terms of the number of inputs and their types. This means that your data input should comply with the types indicated in the model signature. If the data can't be parsed as expected, the invocation will fail. This applies for both online and batch endpoints.

MLmodel

artifact_path: model
flavors:
  python_function:
    env: conda.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    python_version: 3.7.11
  sklearn:
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.24.1
run_id: f1e06708-641d-4a49-8f36-e9dcd8d34346
signature:
  inputs: '[{"name": "age", "type": "double"}, {"name": "sex", "type": "double"},
    {"name": "bmi", "type": "double"}, {"name": "bp", "type": "double"}, {"name":
    "s1", "type": "double"}, {"name": "s2", "type": "double"}, {"name": "s3", "type":
    "double"}, {"name": "s4", "type": "double"}, {"name": "s5", "type": "double"},
    {"name": "s6", "type": "double"}]'
  outputs: '[{"type": "double"}]'
utc_time_created: '2022-03-17 01:56:03.706848'

You can inspect the model signature of your model by opening the MLmodel file associated with your MLflow model. For more details about how signatures work in MLflow, see Signatures in MLflow.

Tip

Signatures in MLflow models are optional but they are highly encouraged as they provide a convenient way to early detect data compatibility issues. For more information about how to log models with signatures read Logging models with a custom signature, environment or samples.

Deployment tools

Azure Machine Learning offers many ways to deploy MLflow models into Online and Batch endpoints. You can deploy models using the following tools:

  • MLflow SDK
  • Azure ML CLI and Azure ML SDK for Python
  • Azure Machine Learning studio

Each workflow has different capabilities, particularly around which type of compute they can target. The following table shows them.

Scenario MLflow SDK Azure ML CLI/SDK Azure ML studio
Deploy to managed online endpoints See example1 See example1 See example1
Deploy to managed online endpoints (with a scoring script) See example
Deploy to batch endpoints See example See example
Deploy to batch endpoints (with a scoring script) See example
Deploy to web services (ACI/AKS) Legacy support2 2 2
Deploy to web services (ACI/AKS - with a scoring script) 2 2 Legacy support2

Note

  • 1 Deployment to online endpoints in private link-enabled workspaces is not supported as public network access is required for package installation. We suggest to deploy with a scoring script on those scenarios.
  • 2 We recommend switching to our managed online endpoints instead.

Which option to use?

If you are familiar with MLflow or your platform support MLflow natively (like Azure Databricks) and you wish to continue using the same set of methods, use the MLflow SDK. On the other hand, if you are more familiar with the Azure ML CLI v2, you want to automate deployments using automation pipelines, or you want to keep deployments configuration in a git repository; we recommend you to use the Azure ML CLI v2. If you want to quickly deploy and test models trained with MLflow, you can use Azure Machine Learning studio UI deployment.

Differences between models deployed in Azure Machine Learning and MLflow built-in server

MLflow includes built-in deployment tools that model developers can use to test models locally. For instance, you can run a local instance of a model registered in MLflow server registry with mlflow models serve -m my_model or you can use the MLflow CLI mlflow models predict. Azure Machine Learning online and batch endpoints run different inferencing technologies which may have different features. Read this section to understand their differences.

Batch vs Online endpoints

Azure Machine Learning supports deploying models to both online and batch endpoints. Online Endpoints compare to MLflow built-in server and they provide a scalable, synchronous, and lightweight way to run models for inference. Batch Endpoints, on the other hand, provide a way to run asynchronous inference over long running inferencing processes that can scale to big amounts of data. This capability is not present by the moment in MLflow server although similar capability can be achieved using Spark jobs.

The rest of this section mostly applies to online endpoints but you can learn more of batch endpoint and MLflow models at Use MLflow models in batch deployments.

Input formats

Input type MLflow built-in server Azure ML Online Endpoints
JSON-serialized pandas DataFrames in the split orientation
JSON-serialized pandas DataFrames in the records orientation Deprecated
CSV-serialized pandas DataFrames Use batch1
Tensor input format as JSON-serialized lists (tensors) and dictionary of lists (named tensors)
Tensor input formatted as in TF Serving’s API

Note

Input structure

Regardless of the input type used, Azure Machine Learning requires inputs to be provided in a JSON payload, within a dictionary key input_data. The following section shows different payload examples and the differences between MLflow built-in server and Azure Machine Learning inferencing server.

Warning

Note that such key is not required when serving models using the command mlflow models serve and hence payloads can't be used interchangeably.

Important

MLflow 2.0 advisory: Notice that the payload's structure has changed in MLflow 2.0.

Payload example for a JSON-serialized pandas DataFrames in the split orientation

{
    "input_data": {
        "columns": [
            "age", "sex", "trestbps", "chol", "fbs", "restecg", "thalach", "exang", "oldpeak", "slope", "ca", "thal"
        ],
        "index": [1],
        "data": [
            [1, 1, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
        ]
    }
}

Payload example for a tensor input

{
    "input_data": [
          [1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2],
          [1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
          [1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2],
          [1, 1, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
    ]
}

Payload example for a named-tensor input

{
    "input_data": {
        "tokens": [
          [0, 655, 85, 5, 23, 84, 23, 52, 856, 5, 23, 1]
        ],
        "mask": [
          [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
        ]
    }
}

For more information about MLflow built-in deployment tools, see MLflow documentation section.

How to customize inference when deploying MLflow models

You may be used to author scoring scripts to customize how inference is executed for your models. However, when deploying MLflow models to Azure Machine Learning, the decision about how inference should be executed is done by the model builder (the person who built the model) rather than by the DevOps engineer (the person who is trying to deploy it). Features like autolog in MLflow automatically log models for you at the best of the knowledge of the framework. Those decisions may not be the ones you want in some scenarios.

For those cases, you can either change how your model is being logged in the training routine or customize inference with a scoring script.

Change how your model is logged during training

When you log a model using either mlflow.autolog or using mlflow.<flavor>.log_model, the flavor used for the model decides how inference should be executed and what gets returned by the model. MLflow doesn't enforce any specific behavior in how the predict() function generates results. However, there are scenarios where you probably want to do some pre-processing or post-processing before and after your model is executed. On another scenarios, you may want to change what's returned like probabilities vs classes.

A solution to this scenario is to implement machine learning pipelines that moves from inputs to outputs directly. For instance, sklearn.pipeline.Pipeline or pyspark.ml.Pipeline are popular (and sometimes encourageable for performance considerations) ways to do so. Another alternative is to customize how your model does inference using a custom model flavor.

Customize inference with a scoring script

Although MLflow models don't require a scoring script, you can still provide one if needed. You can use it to customize how inference is executed for MLflow models. To learn how to do it, refer to Customizing MLflow model deployments (Online Endpoints) and Customizing MLflow model deployments (Batch Endpoints).

Important

When you opt-in to indicate a scoring script for an MLflow model deployment, you also need to provide an environment for it.

Next steps

To learn more, review these articles: