Use MLflow models in batch deployments

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

In this article, learn how to deploy your MLflow model to Azure Machine Learning for both batch inference using batch endpoints. Azure Machine Learning supports no-code deployment of models created and logged with MLflow. This means that you don't have to provide a scoring script or an environment.

For no-code-deployment, Azure Machine Learning

  • Provides a MLflow base image/curated environment that contains the required dependencies to run an Azure Machine Learning Batch job.
  • Creates a batch job pipeline with a scoring script for you that can be used to process data using parallelization.


For more information about the supported file types in batch endpoints with MLflow, view Considerations when deploying to batch inference.

About this example

This example shows how you can deploy an MLflow model to a batch endpoint to perform batch predictions. This example uses an MLflow model based on the UCI Heart Disease Data Set. The database contains 76 attributes, but we are using a subset of 14 of them. The model tries to predict the presence of heart disease in a patient. It is integer valued from 0 (no presence) to 1 (presence).

The model has been trained using an XGBBoost classifier and all the required preprocessing has been packaged as a scikit-learn pipeline, making this model an end-to-end pipeline that goes from raw data to predictions.

The information in this article is based on code samples contained in the azureml-examples repository. To run the commands locally without having to copy/paste YAML and other files, clone the repo and then change directories to the cli/endpoints/batch if you are using the Azure CLI or sdk/endpoints/batch if you are using our SDK for Python.

git clone --depth 1
cd azureml-examples/cli/endpoints/batch

Follow along in Jupyter Notebooks

You can follow along this sample in the following notebooks. In the cloned repository, open the notebook: mlflow-for-batch-tabular.ipynb.


Before following the steps in this article, make sure you have the following prerequisites:


Follow these steps to deploy an MLflow model to a batch endpoint for running batch inference over new data:

  1. First, let's connect to Azure Machine Learning workspace where we are going to work on.

    az account set --subscription <subscription>
    az configure --defaults workspace=<workspace> group=<resource-group> location=<location>
  2. Batch Endpoint can only deploy registered models. In this case, we already have a local copy of the model in the repository, so we only need to publish the model to the registry in the workspace. You can skip this step if the model you are trying to deploy is already registered.

    az ml model create --name $MODEL_NAME --type "mlflow_model" --path "heart-classifier-mlflow/model"
  3. Before moving any forward, we need to make sure the batch deployments we are about to create can run on some infrastructure (compute). Batch deployments can run on any Azure Machine Learning compute that already exists in the workspace. That means that multiple batch deployments can share the same compute infrastructure. In this example, we are going to work on an Azure Machine Learning compute cluster called cpu-cluster. Let's verify the compute exists on the workspace or create it otherwise.

    Create a compute definition YAML like the following one:


    name: cluster-cpu
    type: amlcompute
    size: STANDARD_DS3_v2
    min_instances: 0
    max_instances: 2
    idle_time_before_scale_down: 120

    Create the compute using the following command:

    az ml compute create -f cpu-cluster.yml
  4. Now it is time to create the batch endpoint and deployment. Let's start with the endpoint first. Endpoints only require a name and a description to be created:

    To create a new endpoint, create a YAML configuration like the following:

    name: heart-classifier-batch
    description: A heart condition classifier for batch inference
    auth_mode: aad_token

    Then, create the endpoint with the following command:

    az ml batch-endpoint create -n $ENDPOINT_NAME -f endpoint.yml
  5. Now, let create the deployment. MLflow models don't require you to indicate an environment or a scoring script when creating the deployments as it is created for you. However, you can specify them if you want to customize how the deployment does inference.

    To create a new deployment under the created endpoint, create a YAML configuration like the following:

    endpoint_name: heart-classifier-batch
    name: classifier-xgboost-mlflow
    description: A heart condition classifier based on XGBoost
    model: azureml:heart-classifier@latest
    compute: azureml:cpu-cluster
      instance_count: 2
    max_concurrency_per_instance: 2
    mini_batch_size: 2
    output_action: append_row
    output_file_name: predictions.csv
      max_retries: 3
      timeout: 300
    error_threshold: -1
    logging_level: info

    Then, create the deployment with the following command:

    az ml batch-deployment create -n $DEPLOYMENT_NAME -f endpoint.yml


    Batch deployments only support deploying MLflow models with a pyfunc flavor. To use a different flavor, see Customizing MLflow models deployments with a scoring script..

  6. Although you can invoke a specific deployment inside of an endpoint, you will usually want to invoke the endpoint itself and let the endpoint decide which deployment to use. Such deployment is named the "default" deployment. This gives you the possibility of changing the default deployment and hence changing the model serving the deployment without changing the contract with the user invoking the endpoint. Use the following instruction to update the default deployment:

    az ml batch-endpoint update --name $ENDPOINT_NAME --set defaults.deployment_name=$DEPLOYMENT_NAME
  7. At this point, our batch endpoint is ready to be used.

Testing out the deployment

For testing our endpoint, we are going to use a sample of unlabeled data located in this repository and that can be used with the model. Batch endpoints can only process data that is located in the cloud and that is accessible from the Azure Machine Learning workspace. In this example, we are going to upload it to an Azure Machine Learning data store. Particularly, we are going to create a data asset that can be used to invoke the endpoint for scoring. However, notice that batch endpoints accept data that can be placed in multiple type of locations.

  1. Let's create the data asset first. This data asset consists of a folder with multiple CSV files that we want to process in parallel using batch endpoints. You can skip this step is your data is already registered as a data asset or you want to use a different input type.

    a. Create a data asset definition in YAML:


    name: heart-dataset-unlabeled
    description: An unlabeled dataset for heart classification.
    type: uri_folder
    path: heart-classifier-mlflow/data

    b. Create the data asset:

    az ml data create -f heart-dataset-unlabeled.yml
  2. Now that the data is uploaded and ready to be used, let's invoke the endpoint:

    JOB_NAME = $(az ml batch-endpoint invoke --name $ENDPOINT_NAME --input azureml:heart-dataset-unlabeled@latest | jq -r '.name') 


    The utility jq may not be installed on every installation. You can get installation instructions in this link.


    Notice how we are not indicating the deployment name in the invoke operation. That's because the endpoint automatically routes the job to the default deployment. Since our endpoint only has one deployment, then that one is the default one. You can target an specific deployment by indicating the argument/parameter deployment_name.

  3. A batch job is started as soon as the command returns. You can monitor the status of the job until it finishes:

    az ml job show --name $JOB_NAME

Analyzing the outputs

Output predictions are generated in the predictions.csv file as indicated in the deployment configuration. The job generates a named output called score where this file is placed. Only one file is generated per batch job.

The file is structured as follows:

  • There is one row per each data point that was sent to the model. For tabular data, this means that one row is generated for each row in the input files and hence the number of rows in the generated file (predictions.csv) equals the sum of all the rows in all the processed files. For other data types, there is one row per each processed file.
  • Two columns are indicated:
    • The file name where the data was read from. In tabular data, use this field to know which prediction belongs to which input data. For any given file, predictions are returned in the same order they appear in the input file so you can rely on the row number to match the corresponding prediction.
    • The prediction associated with the input data. This value is returned "as-is" it was provided by the model's predict(). function.

You can download the results of the job by using the job name:

To download the predictions, use the following command:

az ml job download --name $JOB_NAME --output-name score --download-path ./

Once the file is downloaded, you can open it using your favorite tool. The following example loads the predictions using Pandas dataframe.

import pandas as pd
from ast import literal_eval

with open('named-outputs/score/predictions.csv', 'r') as f:
   pd.DataFrame(literal_eval('\n', ',')), columns=['file', 'prediction'])


The file predictions.csv may not be a regular CSV file and can't be read correctly using pandas.read_csv() method.

The output looks as follows:

file prediction
heart-unlabeled-0.csv 0
heart-unlabeled-0.csv 1
... 1
heart-unlabeled-3.csv 0


Notice that in this example the input data was tabular data in CSV format and there were 4 different input files (heart-unlabeled-0.csv, heart-unlabeled-1.csv, heart-unlabeled-2.csv and heart-unlabeled-3.csv).

Considerations when deploying to batch inference

Azure Machine Learning supports no-code deployment for batch inference in managed endpoints. This represents a convenient way to deploy models that require processing of big amounts of data in a batch-fashion.

How work is distributed on workers

Work is distributed at the file level, for both structured and unstructured data. As a consequence, only file datasets or URI folders are supported for this feature. Each worker processes batches of Mini batch size files at a time. Further parallelism can be achieved if Max concurrency per instance is increased.


Nested folder structures are not explored during inference. If you are partitioning your data using folders, make sure to flatten the structure beforehand.


Batch deployments will call the predict function of the MLflow model once per file. For CSV files containing multiple rows, this may impose a memory pressure in the underlying compute. When sizing your compute, take into account not only the memory consumption of the data being read but also the memory footprint of the model itself. This is specially true for models that processes text, like transformer-based models where the memory consumption is not linear with the size of the input. If you encouter several out-of-memory exceptions, consider splitting the data in smaller files with less rows or implement batching at the row level inside of the model/scoring script.

File's types support

The following data types are supported for batch inference when deploying MLflow models without an environment and a scoring script:

File extension Type returned as model's input Signature requirement
.csv pd.DataFrame ColSpec. If not provided, columns typing is not enforced.
.png, .jpg, .jpeg, .tiff, .bmp, .gif np.ndarray TensorSpec. Input is reshaped to match tensors shape if available. If no signature is available, tensors of type np.uint8 are inferred. For additional guidance read Considerations for MLflow models processing images.


Be advised that any unsupported file that may be present in the input data will make the job to fail. You will see an error entry as follows: "ERROR:azureml:Error processing input file: '/mnt/batch/tasks/.../a-given-file.parquet'. File type 'parquet' is not supported.".


If you like to process a different file type, or execute inference in a different way that batch endpoints do by default you can always create the deploymnet with a scoring script as explained in Using MLflow models with a scoring script.

Signature enforcement for MLflow models

Input's data types are enforced by batch deployment jobs while reading the data using the available MLflow model signature. This means that your data input should comply with the types indicated in the model signature. If the data can't be parsed as expected, the job will fail with an error message similar to the following one: "ERROR:azureml:Error processing input file: '/mnt/batch/tasks/.../a-given-file.csv'. Exception: invalid literal for int() with base 10: 'value'".


Signatures in MLflow models are optional but they are highly encouraged as they provide a convenient way to early detect data compatibility issues. For more information about how to log models with signatures read Logging models with a custom signature, environment or samples.

You can inspect the model signature of your model by opening the MLmodel file associated with your MLflow model. For more details about how signatures work in MLflow see Signatures in MLflow.

Flavor support

Batch deployments only support deploying MLflow models with a pyfunc flavor. If you need to deploy a different flavor, see Using MLflow models with a scoring script.

Customizing MLflow models deployments with a scoring script

MLflow models can be deployed to batch endpoints without indicating a scoring script in the deployment definition. However, you can opt in to indicate this file (usually referred as the batch driver) to customize how inference is executed.

You will typically select this workflow when:

  • You need to process a file type not supported by batch deployments MLflow deployments.
  • You need to customize the way the model is run, for instance, use an specific flavor to load it with mlflow.<flavor>.load().
  • You need to do pre/pos processing in your scoring routine when it is not done by the model itself.
  • The output of the model can't be nicely represented in tabular data. For instance, it is a tensor representing an image.
  • You model can't process each file at once because of memory constrains and it needs to read it in chunks.


If you choose to indicate an scoring script for an MLflow model deployment, you will also have to specify the environment where the deployment will run.


Customizing the scoring script for MLflow deployments is only available from the Azure CLI or SDK for Python. If you are creating a deployment using Azure Machine Learning studio UI, please switch to the CLI or the SDK.


Use the following steps to deploy an MLflow model with a custom scoring script.

  1. Identify the folder where your MLflow model is placed.

    a. Go to Azure Machine Learning portal.

    b. Go to the section Models.

    c. Select the model you are trying to deploy and click on the tab Artifacts.

    d. Take note of the folder that is displayed. This folder was indicated when the model was registered.

    Screenshot showing the folder where the model artifacts are placed.

  2. Create a scoring script. Notice how the folder name model you identified before has been included in the init() function.

    import os
    import mlflow
    import pandas as pd
    def init():
        global model
        # AZUREML_MODEL_DIR is an environment variable created during deployment
        # It is the path to the model folder
        model_path = os.path.join(os.environ["AZUREML_MODEL_DIR"], "model")
        model = mlflow.pyfunc.load_model(model_path)
    def run(mini_batch):
        results = pd.DataFrame(columns=['file', 'predictions'])
        for file_path in mini_batch:        
            data = pd.read_csv(file_path)
            pred = model.predict(data)
            df = pd.DataFrame(pred, columns=['predictions'])
            df['file'] = os.path.basename(file_path)
            results = pd.concat([results, df])
        return results
  3. Let's create an environment where the scoring script can be executed. Since our model is MLflow, the conda requirements are also specified in the model package (for more details about MLflow models and the files included on it see The MLmodel format). We are going then to build the environment using the conda dependencies from the file. However, we need also to include the package azureml-core which is required for Batch Deployments.


    If your model is already registered in the model registry, you can download/copy the conda.yml file associated with your model by going to Azure Machine Learning studio > Models > Select your model from the list > Artifacts. Open the root folder in the navigation and select the conda.yml file listed. Click on Download or copy its content.


    This example uses a conda environment specified at /heart-classifier-mlflow/environment/conda.yaml. This file was created by combining the original MLflow conda dependencies file and adding the package azureml-core. You can't use the conda.yml file from the model directly.

    No extra step is required for the Azure Machine Learning CLI. The environment definition will be included in the deployment file.

  4. Let's create the deployment now:

    To create a new deployment under the created endpoint, create a YAML configuration like the following:

    endpoint_name: heart-classifier-batch
    name: classifier-xgboost-custom
    description: A heart condition classifier based on XGBoost
    model: azureml:heart-classifier@latest
       conda_file: ./heart-classifier-mlflow/environment/conda.yaml
      code: ./heart-classifier-custom/code/
    compute: azureml:cpu-cluster
      instance_count: 2
    max_concurrency_per_instance: 2
    mini_batch_size: 2
    output_action: append_row
    output_file_name: predictions.csv
      max_retries: 3
      timeout: 300
    error_threshold: -1
    logging_level: info

    Then, create the deployment with the following command:

    az ml batch-deployment create -f deployment.yml
  5. At this point, our batch endpoint is ready to be used.

Next steps