Deploy MLflow models to online endpoints

APPLIES TO: Azure CLI ml extension v2 (current)

In this article, learn how to deploy your MLflow model to an online endpoint for real-time inference. When you deploy your MLflow model to an online endpoint, you don't need to indicate a scoring script or an environment. This characteristic is usually referred as no-code deployment.

For no-code-deployment, Azure Machine Learning

  • Dynamically installs Python packages provided in the conda.yaml file, this means the dependencies are installed during container runtime.
  • Provides a MLflow base image/curated environment that contains the following items:

Warning

Workspaces without public network access: Azure Machine Learning performs dynamic installation of packages when deploying MLflow models with no-code deployment. As a consequence, deploying MLflow models to online endpoints with no-code deployment in a private network without egress connectivity is not supported by the moment. If that's your case, either enable egress connectivity or indicate the environment to use in the deployment as explained in Customizing MLflow model deployments.

About this example

This example shows how you can deploy an MLflow model to an online endpoint to perform predictions. This example uses an MLflow model based on the Diabetes dataset. This dataset contains ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements obtained from n = 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline (regression).

The model has been trained using an scikit-learn regressor and all the required preprocessing has been packaged as a pipeline, making this model an end-to-end pipeline that goes from raw data to predictions.

The information in this article is based on code samples contained in the azureml-examples repository. To run the commands locally without having to copy/paste YAML and other files, clone the repo and then change directories to the cli/endpoints/online if you are using the Azure CLI or sdk/endpoints/online if you are using our SDK for Python.

git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/cli/endpoints/online

Follow along in Jupyter Notebooks

You can follow along this sample in the following notebooks. In the cloned repository, open the notebook: mlflow_sdk_online_endpoints_progresive.ipynb.

Prerequisites

Before following the steps in this article, make sure you have the following prerequisites:

  • An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning.
  • Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the owner or contributor role for the Azure Machine Learning workspace, or a custom role allowing Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*. For more information, see Manage access to an Azure Machine Learning workspace.
  • You must have a MLflow model registered in your workspace. Particularly, this example will register a model trained for the Diabetes dataset.

Additionally, you will need to:

Connect to your workspace

First, let's connect to Azure Machine Learning workspace where we are going to work on.

az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>

Registering the model

Online Endpoint can only deploy registered models. In this case, we already have a local copy of the model in the repository, so we only need to publish the model to the registry in the workspace. You can skip this step if the model you are trying to deploy is already registered.

MODEL_NAME='sklearn-diabetes'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path "sklearn-diabetes/model"

Alternatively, if your model was logged inside of a run, you can register it directly.

Tip

To register the model, you will need to know the location where the model has been stored. If you are using autolog feature of MLflow, the path will depend on the type and framework of the model being used. We recommend to check the jobs output to identify which is the name of this folder. You can look for the folder that contains a file named MLModel. If you are logging your models manually using log_model, then the path is the argument you pass to such method. As an example, if you log the model using mlflow.sklearn.log_model(my_model, "classifier"), then the path where the model is stored is classifier.

Use the Azure ML CLI v2 to create a model from a training job output. In the following example, a model named $MODEL_NAME is registered using the artifacts of a job with ID $RUN_ID. The path where the model is stored is $MODEL_PATH.

az ml model create --name $MODEL_NAME --path azureml://jobs/$RUN_ID/outputs/artifacts/$MODEL_PATH

Note

The path $MODEL_PATH is the location where the model has been stored in the run.

Deploy an MLflow model to an online endpoint

  1. First. we need to configure the endpoint where the model will be deployed. The following example configures the name and authentication mode of the endpoint:

    endpoint.yaml

    $schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
    name: my-endpoint
    auth_mode: key
    
  2. Let's create the endpoint:

    az ml online-endpoint create --name $ENDPOINT_NAME -f endpoints/online/mlflow/create-endpoint.yaml
    
  3. Now, it is time to configure the deployment. A deployment is a set of resources required for hosting the model that does the actual inferencing.

    sklearn-deployment.yaml

    $schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
    name: sklearn-deployment
    endpoint_name: my-endpoint
    model:
      name: mir-sample-sklearn-mlflow-model
      version: 1
      path: sklearn-diabetes/model
      type: mlflow_model
    instance_type: Standard_DS3_v2
    instance_count: 1
    

    Note

    scoring_script and environment auto generation are only supported for pyfunc model's flavor. To use a different flavor, see Customizing MLflow model deployments.

  4. Let's create the deployment:

    az ml online-deployment create --name sklearn-deployment --endpoint $ENDPOINT_NAME -f endpoints/online/mlflow/sklearn-deployment.yaml --all-traffic
    
  5. Assign all the traffic to the deployment: So far, the endpoint has one deployment, but none of its traffic is assigned to it. Let's assign it.

    This step in not required in the Azure CLI since we used the --all-traffic during creation. If you need to change traffic, you can use the command az ml online-endpoint update --traffic as explained at Progressively update traffic.

  6. Update the endpoint configuration:

    This step in not required in the Azure CLI since we used the --all-traffic during creation. If you need to change traffic, you can use the command az ml online-endpoint update --traffic as explained at Progressively update traffic.

Invoke the endpoint

Once your deployment completes, your deployment is ready to serve request. One of the easier ways to test the deployment is by using the built-in invocation capability in the deployment client you are using.

sample-request-sklearn.json

{"input_data": {
    "columns": [
      "age",
      "sex",
      "bmi",
      "bp",
      "s1",
      "s2",
      "s3",
      "s4",
      "s5",
      "s6"
    ],
    "data": [
      [ 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0 ],
      [ 10.0,2.0,9.0,8.0,7.0,6.0,5.0,4.0,3.0,2.0]
    ],
    "index": [0,1]
  }}

Note

Notice how the key input_data has been used in this example instead of inputs as used in MLflow serving. This is because Azure Machine Learning requires a different input format to be able to automatically generate the swagger contracts for the endpoints. See Differences between models deployed in Azure Machine Learning and MLflow built-in server for details about expected input format.

To submit a request to the endpoint, you can do as follows:

az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file endpoints/online/mlflow/sample-request-sklearn.json

The response will be similar to the following text:

[ 
  11633.100167144921,
  8522.117402884991
]

Important

For MLflow no-code-deployment, testing via local endpoints is currently not supported.

Customizing MLflow model deployments

MLflow models can be deployed to online endpoints without indicating a scoring script in the deployment definition. However, you can opt in to indicate it to customize how inference is executed.

You will typically select this workflow when:

  • You need to customize the way the model is run, for instance, use an specific flavor to load it with mlflow.<flavor>.load_model().
  • You need to do pre/pos processing in your scoring routine when it is not done by the model itself.
  • The output of the model can't be nicely represented in tabular data. For instance, it is a tensor representing an image.
  • Your endpoint is under a private link-enabled workspace.

Important

If you choose to indicate an scoring script for an MLflow model deployment, you will also have to specify the environment where the deployment will run.

Warning

Customizing the scoring script for MLflow deployments is only available from the Azure CLI or SDK for Python. If you are creating a deployment using Azure ML studio, please switch to the CLI or the SDK.

Steps

Use the following steps to deploy an MLflow model with a custom scoring script.

  1. Identify the folder where your MLflow model is placed.

    a. Go to Azure Machine Learning portal.

    b. Go to the section Models.

    c. Select the model you are trying to deploy and click on the tab Artifacts.

    d. Take note of the folder that is displayed. This folder was indicated when the model was registered.

    Screenshot showing the folder where the model artifacts are placed.

  2. Create a scoring script. Notice how the folder name model you identified before has been included in the init() function.

    score.py

    import logging
    import mlflow
    import os
    from io import StringIO
    from mlflow.pyfunc.scoring_server import infer_and_parse_json_input, predictions_to_json
    
    def init():
        global model
        global input_schema
        # The path 'model' corresponds to the path where the MLflow artifacts where stored when
        # registering the model using MLflow format.
        model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'model')
        model = mlflow.pyfunc.load_model(model_path)
        input_schema = model.metadata.get_input_schema()
    
    def run(raw_data):
        json_data = json.loads(raw_data)
        if "input_data" not in json_data.keys():
            raise Exception("Request must contain a top level key named 'input_data'")
    
        serving_input = json.dumps(json_data["input_data"])
        data = infer_and_parse_json_input(raw_data, input_schema)
        result = model.predict(data)
    
        result = StringIO()
        predictions_to_json(raw_predictions, result)
        return result.getvalue()
    

    Tip

    The previous scoring script is provided as an example about how to perform inference of an MLflow model. You can adapt this example to your needs or change any of its parts to reflect your scenario.

    Warning

    MLflow 2.0 advisory: The provided scoring script will work with both MLflow 1.X and MLflow 2.X. However, be advised that the expected input/output formats on those versions may vary. Check the environment definition used to ensure you are using the expected MLflow version. Notice that MLflow 2.0 is only supported in Python 3.8+.

  3. Let's create an environment where the scoring script can be executed. Since our model is MLflow, the conda requirements are also specified in the model package (for more details about MLflow models and the files included on it see The MLmodel format). We are going then to build the environment using the conda dependencies from the file. However, we need also to include the package azureml-inference-server-http which is required for Online Deployments in Azure Machine Learning.

    The conda definition file looks as follows:

    conda.yml

    channels:
    - conda-forge
    dependencies:
    - python=3.7.11
    - pip
    - pip:
      - mlflow
      - scikit-learn==0.24.1
      - cloudpickle==2.0.0
      - psutil==5.8.0
      - pandas==1.3.5
      - azureml-inference-server-http
    name: mlflow-env
    

    Note

    Note how the package azureml-inference-server-http has been added to the original conda dependencies file.

    We will use this conda dependencies file to create the environment:

    The environment will be created inline in the deployment configuration.

  4. Let's create the deployment now:

    Create a deployment configuration file:

    $schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
    name: sklearn-diabetes-custom
    endpoint_name: my-endpoint
    model: azureml:sklearn-diabetes@latest
    environment: 
      image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04
      conda_file: mlflow/sklearn-diabetes/environment/conda.yml
    code_configuration:
      source: mlflow/sklearn-diabetes/src
      scoring_script: score.py
    instance_type: Standard_F2s_v2
    instance_count: 1
    

    Create the deployment:

    az ml online-deployment create -f deployment.yml
    
  5. Once your deployment completes, your deployment is ready to serve request. One of the easier ways to test the deployment is by using a sample request file along with the invoke method.

    sample-request-sklearn.json

    {"input_data": {
        "columns": [
          "age",
          "sex",
          "bmi",
          "bp",
          "s1",
          "s2",
          "s3",
          "s4",
          "s5",
          "s6"
        ],
        "data": [
          [ 1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0 ],
          [ 10.0,2.0,9.0,8.0,7.0,6.0,5.0,4.0,3.0,2.0]
        ],
        "index": [0,1]
      }}
    

    To submit a request to the endpoint, you can do as follows:

    az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file endpoints/online/mlflow/sample-request-sklearn-custom.json
    

    The response will be similar to the following text:

    {
      "predictions": [ 
        11633.100167144921,
        8522.117402884991
      ]
    }
    

    Warning

    MLflow 2.0 advisory: In MLflow 1.X, the key predictions will be missing.

Clean up resources

Once you're done with the endpoint, you can delete the associated resources:

az ml online-endpoint delete --name $ENDPOINT_NAME --yes

Next steps

To learn more, review these articles: