Guidelines for deploying MLflow models
APPLIES TO: Azure CLI ml extension v2 (current)
In this article, learn about deployment of MLflow models to Azure Machine Learning for both real-time and batch inference, and about different tools you can use to manage the deployments.
No-code deployment
When you deploy MLflow models to Azure Machine Learning, unlike with custom model deployment, you don't have to provide a scoring script or an environment. Azure Machine Learning automatically generates the scoring script and environment for you. This functionality is called no-code deployment.
For no-code deployment, Azure Machine Learning:
- Ensures that all the package dependencies indicated in the MLflow model are satisfied.
- Provides an MLflow base image or curated environment that contains the following items:
- Packages required for Azure Machine Learning to perform inference, including
mlflow-skinny
. - A scoring script to perform inference.
- Packages required for Azure Machine Learning to perform inference, including
Tip
Workspaces without public network access: Before you can deploy MLflow models to online endpoints without egress connectivity, you have to package the models (preview). By using model packaging, you can avoid the need for an internet connection, which Azure Machine Learning would otherwise require to dynamically install necessary Python packages for the MLflow models.
Packages and dependencies
Azure Machine Learning automatically generates environments to run inference on MLflow models. To build the environments, Azure Machine Learning reads the conda dependencies that are specified in the MLflow model and adds any packages that are required to run the inferencing server. These extra packages vary depending on deployment type.
The following example conda.yaml file shows conda dependencies specified in an MLflow model.
channels:
- conda-forge
dependencies:
- python=3.10.11
- pip<=23.1.2
- pip:
- mlflow==2.7.1
- cloudpickle==1.6.0
- dataclasses==0.6
- lz4==4.0.0
- numpy==1.23.5
- packaging==23.0
- psutil==5.9.0
- pyyaml==6.0
- scikit-learn==1.1.2
- scipy==1.10.1
- uuid==1.30
name: mlflow-env
Important
MLflow automatically detects packages when it logs a model, and it pins the package versions in the model's conda dependencies. This automatic package detection might not reflect your intentions or requirements. You can alternatively log models with a custom signature, environment or samples.
Models with signatures
MLflow models can include a signature that indicates the expected inputs and their types. When such models are deployed to online or batch endpoints, Azure Machine Learning ensures that the number and types of the data inputs comply with the signature. If the input data can't be parsed as expected, the model invocation fails.
You can inspect an MLflow model signature by opening the MLmodel file. For more information on how signatures work in MLflow, see Signatures in MLflow.
The following example MLmodel file highlights the signature
.
artifact_path: model
flavors:
python_function:
env:
conda: conda.yaml
virtualenv: python_env.yaml
loader_module: mlflow.sklearn
model_path: model.pkl
predict_fn: predict
python_version: 3.10.11
sklearn:
code: null
pickled_model: model.pkl
serialization_format: cloudpickle
sklearn_version: 1.1.2
mlflow_version: 2.7.1
model_uuid: 3f725f3264314c02808dd99d5e5b2781
run_id: 70f15bab-cf98-48f1-a2ea-9ad2108c28cd
signature:
inputs: '[{"name": "age", "type": "double"}, {"name": "sex", "type": "double"},
{"name": "bmi", "type": "double"}, {"name": "bp", "type": "double"}, {"name":
"s1", "type": "double"}, {"name": "s2", "type": "double"}, {"name": "s3", "type":
"double"}, {"name": "s4", "type": "double"}, {"name": "s5", "type": "double"},
{"name": "s6", "type": "double"}]'
outputs: '[{"type": "double"}]'
Tip
Signatures in MLflow models are recommended because they provide a convenient way to detect data compatibility issues. For more information about how to log models with signatures, see Logging models with a custom signature, environment or samples.
Deployment in the MLflow built-in server vs. deployment in Azure Machine Learning inferencing server
Model developers can use MLflow built-in deployment tools to test models locally. For instance, you can run a local instance of a model that's registered in the MLflow server registry by using mlflow models serve
or the MLflow CLI mlflow models predict
. For more information about MLflow built-in deployment tools, see Built-in deployment tools in the MLflow documentation.
Azure Machine Learning also supports deploying models to both online and batch endpoints. These endpoints run different inferencing technologies that can have different features.
Azure Machine Learning online endpoints, similar to the MLflow built-in server, provide a scalable, synchronous, and lightweight way to run models for inference.
Azure Machine Learning batch endpoints can run asynchronous inference over long-running inferencing processes that can scale to large amounts of data. The MLflow server lacks this capability, although you can achieve a similar capability by using Spark jobs. To learn more about batch endpoints and MLflow models, see Use MLflow models in batch deployments.
Input formats
The following table shows the input types supported by the MLflow built-in server versus Azure Machine Learning online endpoints.
Input type | MLflow built-in server | Azure Machine Learning online endpoint |
---|---|---|
JSON-serialized pandas DataFrames in the split orientation | ✓ | ✓ |
JSON-serialized pandas DataFrames in the records orientation | Deprecated | |
CSV-serialized pandas DataFrames | ✓ | Use batch inferencing. For more information, see Deploy MLflow models to batch endpoints. |
TensorFlow input as JSON-serialized lists (tensors) and dictionary of lists (named tensors) | ✓ | ✓ |
TensorFlow input using the TensorFlow Serving API | ✓ |
The following sections focus on MLflow models that are deployed to Azure Machine Learning online endpoints.
Input structure
Regardless of input type, Azure Machine Learning requires you to provide inputs in a JSON payload in the dictionary key input_data
. This key isn't required when you use the command mlflow models serve
to serve models, so payloads can't be used interchangeably for Azure Machine Learning online endpoints and the MLflow built-in server.
Important
The payload structure changed in MLflow 2.0.
The following payload examples show differences between a model deployed in the MLflow built-in server versus the Azure Machine Learning inferencing server.
JSON-serialized pandas DataFrame in the split orientation
{
"input_data": {
"columns": [
"age", "sex", "trestbps", "chol", "fbs", "restecg", "thalach", "exang", "oldpeak", "slope", "ca", "thal"
],
"index": [1],
"data": [
[1, 1, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
]
}
}
Tensor input
{
"input_data": [
[1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2],
[1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
[1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2],
[1, 1, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
]
}
Named-tensor input
{
"input_data": {
"tokens": [
[0, 655, 85, 5, 23, 84, 23, 52, 856, 5, 23, 1]
],
"mask": [
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
]
}
}
Inference customization for MLflow models
Scoring scripts customize how to execute inferencing for custom models. But for MLflow model deployment, the decision about how to execute inferencing is made by the model builder rather than by the deployment engineer. Each model framework can automatically apply specific inference routines.
If you need to change how inference is executed for an MLflow model, you can do one of the following things:
- Change how your model is being logged in the training routine.
- Customize inference with a scoring script at deployment time.
Change how your model is logged during training
When you log a model by using either mlflow.autolog
or mlflow.<flavor>.log_model
, the flavor used for the model determines how to execute inference and what results to return. MLflow doesn't enforce any specific behavior for how the predict()
function generates results.
In some cases, you might want to do some preprocessing or postprocessing before and after your model executes. Or, you might want to change what is returned; for example, probabilities instead of classes. One solution is to implement machine learning pipelines that move from inputs to outputs directly.
For example, sklearn.pipeline.Pipeline
or pyspark.ml.Pipeline
are popular ways to implement pipelines, and are sometimes recommended for performance reasons. You can also customize how your model does inferencing by logging custom models.
Customize inference with a scoring script
Although MLflow models don't require a scoring script, you can still provide one to customize inference execution for MLflow models if needed. For more information on how to customize inference, see Customize MLflow model deployments for online endpoints or Customize model deployment with scoring script for batch endpoints.
Important
If you choose to specify a scoring script for an MLflow model deployment, you also need to provide an environment for the deployment.
Deployment tools
Azure Machine Learning offers the following tools to deploy MLflow models to online and batch endpoints:
- MLflow SDK
- Azure Machine Learning CLI v2
- Azure Machine Learning SDK for Python
- Azure Machine Learning studio
Each tool has different capabilities, particularly for which type of compute it can target. The following table shows the support for different MLflow deployment scenarios.
Scenario | MLflow SDK | Azure Machine Learning CLI/SDK or studio |
---|---|---|
Deploy to managed online endpoints1 | Supported. See Progressive rollout of MLflow models to online endpoints | Supported. See Deploy MLflow models to online endpoints |
Deploy to managed online endpoints with a scoring script | Not supported3 | Supported. See Customize MLflow model deployments |
Deploy to batch endpoints | Not supported3 | Supported. See Use MLflow models in batch deployments |
Deploy to batch endpoints with a scoring script | Not supported3 | Supported. See Customize model deployment with scoring script |
Deploy to web services like Azure Container Instances or Azure Kubernetes Service (AKS) | Legacy support2 | Not supported2 |
Deploy to web services like Container Instances or AKS with a scoring script | Not supported3 | Legacy support2 |
1 Deployment to online endpoints that are in workspaces with private link enabled requires you to package models before deployment (preview).
2 Switch to managed online endpoints if possible.
3 Open-source MLflow doesn't have the concept of a scoring script and doesn't support batch execution.
Choose a deployment tool
Use the MLflow SDK if:
- You're familiar with MLflow and want to continue using the same methods, and
- You're using a platform like Azure Databricks that supports MLflow natively.
Use the Azure Machine Learning CLI v2 or SDK for Python if:
- You're familiar with them, or
- You want to automate deployment with pipelines, or
- You want to keep deployment configuration in a Git repository.
Use the Azure Machine Learning studio UI if you want to quickly deploy and test models trained with MLflow.