Deploy models for scoring in batch endpoints

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

Batch endpoints let you deploy models that run inference over large volumes of data. These endpoints simplify hosting models for batch scoring, so you can focus on machine learning instead of infrastructure.

Use batch endpoints to deploy models when:

You use expensive models that take longer to run inference.
You perform inference over large amounts of data distributed in multiple files.
You don't need low latency.
You take advantage of parallelization.

This article shows how to use a batch endpoint to deploy a machine learning model that solves the classic MNIST (Modified National Institute of Standards and Technology) digit recognition problem. The deployed model performs batch inferencing over large amounts of data, such as image files. The process begins with creating a batch deployment of a model built using Torch. This deployment becomes the default in the endpoint. Later, create a second deployment of a model built with TensorFlow (Keras), test the second deployment, and set it as the endpoint's default deployment.

Prerequisites

Before you follow the steps in this article, make sure you have the following prerequisites:

An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning.
An Azure Machine Learning workspace. If you don't have one, use the steps in the How to manage workspaces article to create one.
To perform the following tasks, ensure that you have these permissions in the workspace:
- To create/manage batch endpoints and deployments: Use owner role, contributor role, or a custom role allowing Microsoft.MachineLearningServices/workspaces/batchEndpoints/*.
- To create ARM deployments in the workspace resource group: Use owner role, contributor role, or a custom role allowing Microsoft.Resources/deployments/write in the resource group where the workspace is deployed.
You need to install the following software to work with Azure Machine Learning:
- Azure CLI
- Python
- Studio
APPLIES TO: Azure CLI ml extension v2 (current)

The Azure CLI and the ml extension for Azure Machine Learning.
```
az extension add -n ml
```
APPLIES TO: Python SDK azure-ai-ml v2 (current)

Install the Azure Machine Learning SDK for Python.
```
pip install azure-ai-ml
```
There are no further requirements if you plan to use Azure Machine Learning studio.

Clone the examples repository

The example in this article is based on code samples contained in the azureml-examples repository. To run the commands locally without having to copy/paste YAML and other files, first clone the repo and then change directories to the folder:

git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/cli/endpoints/batch/deploy-models/mnist-classifier

!git clone https://github.com/Azure/azureml-examples --depth 1
!cd azureml-examples/sdk/python/endpoints/batch/deploy-models/mnist-classifier

To follow along with this example in a Jupyter Notebook, in the cloned repository, open the notebook: mnist-batch.ipynb.

Prepare your system

Connect to your workspace

First, connect to the Azure Machine Learning workspace where you work.

If you haven't already set the defaults for the Azure CLI, save your default settings. To avoid entering the values for your subscription, workspace, resource group, and location multiple times, run this code:

az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>

The workspace is the top-level resource for Azure Machine Learning, providing a centralized location to manage all the artifacts you create when using Azure Machine Learning. In this section, you connect to the workspace in which you perform deployment tasks.

Import the required libraries:

from azure.ai.ml import MLClient, Input, load_component
from azure.ai.ml.entities import BatchEndpoint, ModelBatchDeployment, ModelBatchDeploymentSettings, PipelineComponentBatchDeployment, Model, AmlCompute, Data, BatchRetrySettings, CodeConfiguration, Environment
from azure.ai.ml.constants import AssetTypes, BatchDeploymentOutputAction
from azure.ai.ml.dsl import pipeline
from azure.identity import DefaultAzureCredential

Note

Classes ModelBatchDeployment and PipelineComponentBatchDeployment were introduced in version 1.7.0 of the SDK.

Configure workspace details and get a handle to the workspace:

subscription_id = "<subscription>"
resource_group = "<resource-group>"
workspace = "<workspace>"

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)

Create compute

Batch endpoints run on compute clusters and support both Azure Machine Learning compute clusters (AmlCompute) and Kubernetes clusters. Clusters are a shared resource, therefore, one cluster can host one or many batch deployments (along with other workloads, if desired).

Create a compute named batch-cluster, as shown in the following code. Adjust as needed and reference your compute using azureml:<your-compute-name>.

az ml compute create -n batch-cluster --type amlcompute --min-instances 0 --max-instances 5

compute_name = "batch-cluster"
if not any(filter(lambda m: m.name == compute_name, ml_client.compute.list())):
    compute_cluster = AmlCompute(
        name=compute_name,
        description="CPU cluster compute",
        min_instances=0,
        max_instances=2,
    )
    ml_client.compute.begin_create_or_update(compute_cluster).result()

Note

You're not charged for the compute at this point because the cluster remains at 0 nodes until a batch endpoint is invoked and a batch scoring job is submitted. For more information about compute costs, see Manage and optimize cost for AmlCompute.

Create a batch endpoint

A batch endpoint is an HTTPS endpoint that clients call to trigger a batch scoring job. A batch scoring job scores multiple inputs. A batch deployment is a set of compute resources hosting the model that performs batch scoring (or batch inferencing). One batch endpoint can have multiple batch deployments. For more information about batch endpoints, see What are batch endpoints?.

Tip

One of the batch deployments serves as the default deployment for the endpoint. When the endpoint is invoked, the default deployment performs batch scoring. For more information about batch endpoints and deployments, see batch endpoints and batch deployment.

Name the endpoint. The endpoint's name must be unique within an Azure region because the name is included in the endpoint's URI. For example, there can be only one batch endpoint with the name mybatchendpoint in westus2.
- Azure CLI
- Python
- Studio
Place the endpoint's name in a variable to reference it easily later.
```
ENDPOINT_NAME="mnist-batch"
```
Place the endpoint's name in a variable so you can easily reference it later.
```
endpoint_name = "mnist-batch"
```
You provide the endpoint's name later, at the point when you create the deployment.

Configure the batch endpoint

The following YAML file defines a batch endpoint. Use this file with the CLI command for batch endpoint creation.

endpoint.yml

$schema: https://azuremlschemas.azureedge.net/latest/batchEndpoint.schema.json
name: mnist-batch
description: A batch endpoint for scoring images from the MNIST dataset.
tags:
  type: deep-learning

The following table describes the key properties of the endpoint. For the complete batch endpoint YAML schema, see CLI (v2) batch endpoint YAML schema.

Key	Description
`name`	The name of the batch endpoint. Needs to be unique at the Azure region level.
`description`	The description of the batch endpoint. This property is optional.
`tags`	The tags to include in the endpoint. This property is optional.

endpoint = BatchEndpoint(
    name=endpoint_name,
    description="A batch endpoint for scoring images from the MNIST dataset.",
    tags={"type": "deep-learning"},
)

The following table describes the key properties of the endpoint. For more information on batch endpoint definition, see BatchEndpoint Class.

Key	Description
`name`	The name of the batch endpoint. Needs to be unique at the Azure region level.
`description`	The description of the batch endpoint. This property is optional.
`tags`	The tags to include in the endpoint. This property is optional.

Create the endpoint:
- Azure CLI
- Python
- Studio
Run this code to create a batch endpoint.
```
az ml batch-endpoint create --file endpoint.yml  --name $ENDPOINT_NAME
```
```
ml_client.begin_create_or_update(endpoint).result()
```
You create the endpoint later, at the point when you create the deployment.

Create a batch deployment

A model deployment is a set of resources required for hosting the model that does the actual inferencing. To create a batch model deployment, you need the following items:

A registered model in the workspace
The code to score the model
An environment with the model's dependencies installed
The precreated compute and resource settings

Begin by registering the model to be deployed—a Torch model for the popular digit recognition problem (MNIST). Batch Deployments can only deploy models that are registered in the workspace. You can skip this step if the model you want to deploy is already registered.

Tip

Models are associated with the deployment, rather than with the endpoint. This means that a single endpoint can serve different models (or model versions) under the same endpoint as long as the different models (or model versions) are deployed in different deployments.
- Azure CLI
- Python
- Studio
```
MODEL_NAME='mnist-classifier-torch'
az ml model create --name $MODEL_NAME --type "custom_model" --path "deployment-torch/model"
```
```
model_name = "mnist-classifier-torch"
model_local_path = "deployment-torch/model/"

model = ml_client.models.create_or_update(
    Model(
        name=model_name,
        path=model_local_path,
        type=AssetTypes.CUSTOM_MODEL,
        tags={"task": "classification", "framework": "torch"},
    )
)
```
1. Navigate to the Models tab on the side menu.
2. Select Register > From local files.
3. In the wizard, leave the option Model type as Unspecified type.
4. Select Browse > Browse folder > Select the folder deployment-torch/model > Next.
5. Configure the name of the model: mnist-classifier-torch. You can leave the rest of the fields as they are.
6. Select Register.

Now it's time to create a scoring script. Batch deployments require a scoring script that indicates how a given model should be executed and how input data must be processed. Batch endpoints support scripts created in Python. In this case, you deploy a model that reads image files representing digits and outputs the corresponding digit. The scoring script is as follows:

Note

For MLflow models, Azure Machine Learning automatically generates the scoring script, so you're not required to provide one. If your model is an MLflow model, you can skip this step. For more information about how batch endpoints work with MLflow models, see the article Using MLflow models in batch deployments.

Warning

If you're deploying an Automated machine learning (AutoML) model under a batch endpoint, note that the scoring script that AutoML provides only works for online endpoints and isn't designed for batch execution. For information on how to create a scoring script for your batch deployment, see Author scoring scripts for batch deployments.

deployment-torch/code/batch_driver.py

import os
import pandas as pd
import torch
import torchvision
import glob
from os.path import basename
from mnist_classifier import MnistClassifier
from typing import List


def init():
    global model
    global device

    # AZUREML_MODEL_DIR is an environment variable created during deployment
    # It is the path to the model folder
    model_path = os.environ["AZUREML_MODEL_DIR"]
    model_file = glob.glob(f"{model_path}/*/*.pt")[-1]

    model = MnistClassifier()
    model.load_state_dict(torch.load(model_file))
    model.eval()

    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")


def run(mini_batch: List[str]) -> pd.DataFrame:
    print(f"Executing run method over batch of {len(mini_batch)} files.")

    results = []
    with torch.no_grad():
        for image_path in mini_batch:
            image_data = torchvision.io.read_image(image_path).float()
            batch_data = image_data.expand(1, -1, -1, -1)
            input = batch_data.to(device)

            # perform inference
            predict_logits = model(input)

            # Compute probabilities, classes and labels
            predictions = torch.nn.Softmax(dim=-1)(predict_logits)
            predicted_prob, predicted_class = torch.max(predictions, axis=-1)

            results.append(
                {
                    "file": basename(image_path),
                    "class": predicted_class.numpy()[0],
                    "probability": predicted_prob.numpy()[0],
                }
            )

    return pd.DataFrame(results)

Create an environment where your batch deployment runs. The environment should include the packages azureml-core and azureml-dataset-runtime[fuse], which are required by batch endpoints, plus any dependency your code requires for running. In this case, the dependencies have been captured in a conda.yaml file:

deployment-torch/environment/conda.yaml
```
name: mnist-env
channels:
  - conda-forge
dependencies:
  - python=3.8.5
  - pip<22.0
  - pip:
    - torch==1.13.0
    - torchvision==0.14.0
    - pytorch-lightning
    - pandas
    - azureml-core
    - azureml-dataset-runtime[fuse]
```
Important

The packages azureml-core and azureml-dataset-runtime[fuse] are required by batch deployments and should be included in the environment dependencies.

Specify the environment as follows:
- Azure CLI
- Python
- Studio
The environment definition is included in the deployment definition itself as an anonymous environment. You see in the following lines in the deployment:
```
environment:
  name: batch-torch-py38
  image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
  conda_file: environment/conda.yaml
```
Get a reference to the environment:
```
env = Environment(
    name="batch-torch-py38",
    conda_file="deployment-torch/environment/conda.yaml",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest",
)
```
In the Azure Machine Learning studio, follow these steps:
Go to the Environments tab in the side menu.
Select Custom environments > Create.
1. Enter the name of the environment, in this case torch-batch-env.
2. For Select environment source, select Use existing docker image with optional conda file.
3. For Container registry image path, enter mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04.
4. Select Next to go to the "Customize" section.
5. Copy the content of the file deployment-torch/environment/conda.yaml from the GitHub repo into the portal.
Select Next until you reach the "Review page."
Select Create and wait for the environment to be ready.

Warning

Curated environments aren't supported in batch deployments. You need to specify your own environment. You can always use the base image of a curated environment as yours to simplify the process.

Create a deployment definition

deployment-torch/deployment.yml

$schema: https://azuremlschemas.azureedge.net/latest/modelBatchDeployment.schema.json
name: mnist-torch-dpl
description: A deployment using Torch to solve the MNIST classification dataset.
endpoint_name: mnist-batch
type: model
model:
  name: mnist-classifier-torch
  path: model
code_configuration:
  code: code
  scoring_script: batch_driver.py
environment:
  name: batch-torch-py38
  image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
  conda_file: environment/conda.yaml
compute: azureml:batch-cluster
resources:
  instance_count: 1
settings:
  max_concurrency_per_instance: 2
  mini_batch_size: 10
  output_action: append_row
  output_file_name: predictions.csv
  retry_settings:
    max_retries: 3
    timeout: 30
  error_threshold: -1
  logging_level: info

The following table describes the key properties of the batch deployment. For the full batch deployment YAML schema, see CLI (v2) batch deployment YAML schema.

Key	Description
`name`	The name of the deployment.
`endpoint_name`	The name of the endpoint to create the deployment under.
`model`	The model to be used for batch scoring. The example defines a model inline using `path`. This definition allows model files to be automatically uploaded and registered with an autogenerated name and version. See the Model schema for more options. As a best practice for production scenarios, you should create the model separately and reference it here. To reference an existing model, use the `azureml:<model-name>:<model-version>` syntax.
`code_configuration.code`	The local directory that contains all the Python source code to score the model.
`code_configuration.scoring_script`	The Python file in the `code_configuration.code` directory. This file must have an `init()` function and a `run()` function. Use the `init()` function for any costly or common preparation (for example, to load the model in memory). `init()` is called only once at the start of the process. Use `run(mini_batch)` to score each entry; the value of `mini_batch` is a list of file paths. The `run()` function should return a pandas DataFrame or an array. Each returned element indicates one successful run of input element in the `mini_batch`. For more information on how to author a scoring script, see Understanding the scoring script.
`environment`	The environment to score the model. The example defines an environment inline using `conda_file` and `image`. The `conda_file` dependencies are installed on top of the `image`. The environment is automatically registered with an autogenerated name and version. See the Environment schema for more options. As a best practice for production scenarios, you should create the environment separately and reference it here. To reference an existing environment, use the `azureml:<environment-name>:<environment-version>` syntax.
`compute`	The compute to run batch scoring. The example uses the `batch-cluster` created at the beginning and references it using the `azureml:<compute-name>` syntax.
`resources.instance_count`	The number of instances to be used for each batch scoring job.
`settings.max_concurrency_per_instance`	The maximum number of parallel `scoring_script` runs per instance.
`settings.mini_batch_size`	The number of files the `scoring_script` can process in one `run()` call.
`settings.output_action`	How the output should be organized in the output file. `append_row` merges all `run()` returned output results into one single file named `output_file_name`. `summary_only` won't merge the output results and will only calculate `error_threshold`.
`settings.output_file_name`	The name of the batch scoring output file for `append_row` `output_action`.
`settings.retry_settings.max_retries`	The number of max tries for a failed `scoring_script` `run()`.
`settings.retry_settings.timeout`	The timeout in seconds for a `scoring_script` `run()` for scoring a mini batch.
`settings.error_threshold`	The number of input file scoring failures that should be ignored. If the error count for the entire input goes above this value, the batch scoring job is terminated. The example uses `-1`, which indicates that any number of failures is allowed without terminating the batch scoring job.
`settings.logging_level`	Log verbosity. Values in increasing verbosity are: WARNING, INFO, and DEBUG.
`settings.environment_variables`	Dictionary of environment variable name-value pairs to set for each batch scoring job.

deployment = ModelBatchDeployment(
    name="mnist-torch-dpl",
    description="A deployment using Torch to solve the MNIST classification dataset.",
    endpoint_name=endpoint_name,
    model=model,
    code_configuration=CodeConfiguration(
        code="deployment-torch/code/", scoring_script="batch_driver.py"
    ),
    environment=env,
    compute=compute_name,
    settings=ModelBatchDeploymentSettings(
        max_concurrency_per_instance=2,
        mini_batch_size=10,
        instance_count=2,
        output_action=BatchDeploymentOutputAction.APPEND_ROW,
        output_file_name="predictions.csv",
        retry_settings=BatchRetrySettings(max_retries=3, timeout=30),
        logging_level="info",
    ),
)

The BatchDeployment Class allows you to configure the following key properties of a batch deployment:

Key	Description
`name`	Name of the deployment.
`endpoint_name`	Name of the endpoint to create the deployment under.
`model`	The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification.
`environment`	The environment to use for the deployment. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification (optional for MLflow models).
`code_configuration`	The configuration about how to run inference for the model (optional for MLflow models).
`code_configuration.code`	Path to the source code directory for scoring the model.
`code_configuration.scoring_script`	Relative path to the scoring file in the source code directory.
`compute`	Name of the compute target on which to execute the batch scoring jobs.
`instance_count`	The number of nodes to use for each batch scoring job.
`settings`	The model deployment inference configuration.
`settings.max_concurrency_per_instance`	The maximum number of parallel `scoring_script` runs per instance.
`settings.mini_batch_size`	The number of files the `code_configuration.scoring_script` can process in one `run`() call.
`settings.retry_settings`	Retry settings for scoring each mini batch.
`settings.retry_settingsmax_retries`	The maximum number of retries for a failed or timed-out mini batch (default is 3).
`settings.retry_settingstimeout`	The timeout in seconds for scoring a mini batch (default is 30).
`settings.output_action`	How the output should be organized in the output file. Allowed values are `append_row` or `summary_only`. Default is `append_row`.
`settings.logging_level`	The log verbosity level. Allowed values are `warning`, `info`, `debug`. Default is `info`.
`settings.environment_variables`	Dictionary of environment variable name-value pairs to set for each batch scoring job.

Go to the Endpoints tab in the side menu.
1. Select the tab Batch endpoints > Create.
2. Give the endpoint a name, in this case mnist-batch. You can configure the rest of the fields or leave them blank.
3. Select Next to go to the "Model" section.
4. Select the model mnist-classifier-torch.
5. Select Next to go to the "Deployment" page.
6. Give the deployment a name.
For Output action, make sure Append row is selected.
1. For Output file name, ensure the batch scoring output file is the one you need. Default is predictions.csv.
2. For Mini batch size, adjust the size of the files that will be included on each mini-batch. This size controls the amount of data your scoring script receives per batch.
3. For Scoring timeout (seconds), ensure you're giving enough time for your deployment to score a given batch of files. If you increase the number of files, you usually have to increase the timeout value too. More expensive models (like those based on deep learning) might require high values in this field.
4. For Max concurrency per instance, configure the number of executors you want to have for each compute instance you get in the deployment. A higher number here guarantees a higher degree of parallelization but it also increases the memory pressure on the compute instance. Tune this value altogether with Mini batch size.
5. Once done, select Next to go to the "Code + environment" page.
6. For "Select a scoring script for inferencing", browse to find and select the scoring script file deployment-torch/code/batch_driver.py.
7. In the "Select environment" section, select the environment you created previously torch-batch-env.
8. Select Next to go to the "Compute" page.
9. Select the compute cluster you created in a previous step.
  
  Warning
  
  Azure Kubernetes clusters are supported in batch deployments, but only when created using the Azure Machine Learning CLI or Python SDK.
10. For Instance count, enter the number of compute instances you want for the deployment. In this case, use 2.
11. Select Next.
Create the deployment:
- Azure CLI
- Python
- Studio
Run the following code to create a batch deployment under the batch endpoint, and set it as the default deployment.
```
az ml batch-deployment create --file deployment-torch/deployment.yml --endpoint-name $ENDPOINT_NAME --set-default
```
Tip

The --set-default parameter sets the newly created deployment as the default deployment of the endpoint. It's a convenient way to create a new default deployment of the endpoint, especially for the first deployment creation. As a best practice for production scenarios, you might want to create a new deployment without setting it as default. Verify that the deployment works as you expect, and then update the default deployment later. For more information on implementing this process, see the Deploy a new model section.
Using the MLClient created earlier, create the deployment in the workspace. This command starts the deployment creation and returns a confirmation response while the deployment creation continues.
```
ml_client.begin_create_or_update(deployment).result()
```
Once the deployment is completed, set the new deployment as the default deployment in the endpoint:
```
endpoint = ml_client.batch_endpoints.get(endpoint_name)
endpoint.defaults.deployment_name = deployment.name
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()
```
In the wizard, select Create to start the deployment process.
Check batch endpoint and deployment details.
- Azure CLI
- Python
- Studio
Use show to check the endpoint and deployment details. To check a batch deployment, run the following code:
```
DEPLOYMENT_NAME="mnist-torch-dpl"
az ml batch-deployment show --name $DEPLOYMENT_NAME --endpoint-name $ENDPOINT_NAME
```
To check a batch deployment, run the following code:
```
ml_client.batch_deployments.get(name=deployment.name, endpoint_name=endpoint.name)
```
After creating the batch endpoint, the endpoint's details page opens up. You can also find this page by following these steps:
1. Navigate to the Endpoints tab on the side menu.
Select the Batch endpoints tab.
1. Select the batch endpoint you want to view.
2. The endpoint's Details page shows the details of the endpoint along with all the deployments available in the endpoint.

Run batch endpoints and access results

Understanding the data flow

Before running your batch endpoint, understand how data flows through the system:

Inputs: Data to process (score). This includes:

Files stored in Azure Storage (blob storage, data lake)
Folders with multiple files
Registered datasets in Azure Machine Learning

Processing: The deployed model processes the input data in batches (mini-batches) and generates predictions.

Outputs: Results from the model, stored as files in Azure Storage. By default, outputs are saved to the workspace's default blob storage, but you can specify a different location.

Invoke a batch endpoint

Invoking a batch endpoint triggers a batch scoring job. The job name is returned in the invoke response and tracks the batch scoring progress. Specify the input data path so the endpoints can locate the data to score. The following example shows how to start a new job over a sample data of the MNIST dataset stored in an Azure Storage Account.

You can run and invoke a batch endpoint using Azure CLI, Azure Machine Learning SDK, or REST endpoints. For more information about these options, see Create jobs and input data for batch endpoints.

Note

How does parallelization work?

Batch deployments distribute work at the file level. For example, a folder with 100 files and mini-batches of 10 files generates 10 batches of 10 files each. This happens regardless of file size. If files are too large for processing in mini-batches, split them into smaller files to increase parallelism or reduce the number of files per mini-batch. Batch deployments currently don't account for skews in file size distribution.

JOB_NAME=$(az ml batch-endpoint invoke --name $ENDPOINT_NAME --input $SAMPLE_INPUT_URI --input-type uri_folder --query name -o tsv)

Tip

What's the difference between the inputs and input parameter when you invoke an endpoint?

In general, you can use a dictionary inputs = {} parameter with the invoke method to provide an arbitrary number of required inputs to a batch endpoint that contains a model deployment or a pipeline deployment.

For a model deployment, you can use the input parameter as a shorter way to specify the input data location for the deployment. This approach works because a model deployment always takes only one data input.

job = ml_client.batch_endpoints.invoke(
    endpoint_name=endpoint_name,
    deployment_name=deployment.name,
    input=Input(
        path=Path(temp_data_path) / "data",
        type=AssetTypes.URI_FOLDER,
    ),
)

Navigate to the Endpoints tab on the side menu.
Select the tab Batch endpoints.
Select the batch endpoint you just created.
Select Create job.
For Deployment, select the deployment to execute.
Select Next to go to the "Select data source" page.
For the "Data source type", select Datastore.
For the "Datastore", select workspaceblobstore from the dropdown menu.
For "Path", enter the full URL https://azuremlexampledata.blob.core.windows.net/data/mnist/sample.

Tip

This path works only because the given path has public access enabled. In general, you need to register the data source as a Datastore. See Accessing data from batch endpoints jobs for details.
Select Next.
Select Create to start the job.

Batch endpoints support reading files or folders that are located in different locations. To learn more about the supported types and how to specify them, see Accessing data from batch endpoints jobs.

Monitor batch job execution progress

Batch scoring jobs take time to process all inputs.

The following code checks the job status and outputs a link to the Azure Machine Learning studio for further details.

az ml job show -n $JOB_NAME --web

The following code checks the job status and outputs a link to the Azure Machine Learning studio for further details.

ml_client.jobs.get(job.name)

Check batch scoring results

The job outputs are stored in cloud storage, either in the workspace's default blob storage, or the storage you specified. To learn how to change the defaults, see Configure the output location. The following steps allow you to view the scoring results in Azure Storage Explorer when the job is completed:

Run the following code to open the batch scoring job in Azure Machine Learning studio. The job studio link is also included in the response of invoke, as the value of interactionEndpoints.Studio.endpoint.
```
az ml job show -n $JOB_NAME --web
```
In the graph of the job, select the batchscoring step.
Select the Outputs + logs tab and then select Show data outputs.
From Data outputs, select the icon to open Storage Explorer.

The scoring results in Storage Explorer are similar to the following sample page:

Configure the output location

By default, batch scoring results are stored in the workspace's default blob store in a folder named after the job (a system-generated GUID). Configure the output location when invoking the batch endpoint.

Use output-path to configure any folder in an Azure Machine Learning registered datastore. The syntax for the --output-path is the same as --input when you're specifying a folder, that is, azureml://datastores/<datastore-name>/paths/<path-on-datastore>/. Use --set output_file_name=<your-file-name> to configure a new output file name.

OUTPUT_FILE_NAME=predictions_`echo $RANDOM`.csv
OUTPUT_PATH="azureml://datastores/workspaceblobstore/paths/$ENDPOINT_NAME"

JOB_NAME=$(az ml batch-endpoint invoke --name $ENDPOINT_NAME --input $SAMPLE_INPUT_URI --output-path $OUTPUT_PATH --set output_file_name=$OUTPUT_FILE_NAME --query name -o tsv)

The params_override parameter allows you to modify deployment settings for a specific job without changing the deployment configuration permanently. This is useful for adjusting settings like output location, mini-batch size, or instance count for individual jobs.

The params_override parameter values correspond to deployment configuration settings that can be temporarily modified for individual jobs. These parameters come from your deployment's YAML schema settings, datastore configurations (like output paths), and runtime variables you define in your code.

Use params_override to configure any folder in an Azure Machine Learning registered data store. Only registered data stores are supported as output paths. In this example you use the default data store:

batch_ds = ml_client.datastores.get_default()

Once you've identified the data store you want to use, configure the output as follows:

filename = f"predictions-{random.randint(0,99999)}.csv"

job = ml_client.batch_endpoints.invoke(
    endpoint_name=endpoint_name,
    input=Input(
        path=Path(temp_data_path) / "data",
        type=AssetTypes.URI_FOLDER,
    ),
    params_override=[
        {"output_dataset.datastore_id": f"azureml:{batch_ds.id}"},
        {"output_dataset.path": f"/{endpoint_name}/"},
        {"output_file_name": filename},
    ],
)

Tip

Use params_override when you need different settings for different jobs without modifying your deployment. This is especially useful for handling varying data sizes or experimenting with performance settings.

Warning

You must use a unique output location. If the output file exists, the batch scoring job fails.

Important

Unlike inputs, outputs can be stored only in Azure Machine Learning data stores that run on blob storage accounts.

Overwrite deployment configuration for each job

When you invoke a batch endpoint, you can overwrite some settings to make the best use of compute resources and improve performance. This feature is useful when you need different settings for different jobs without permanently modifying the deployment.

Which settings can be overridden?

You can configure the following settings on a per-job basis:

Setting	When to use	Example scenario
Instance count	When you have varying data volumes	Use more instances for larger datasets (10 instances for 1 million files vs. 2 instances for 100,000 files).
Mini-batch size	When you need to balance throughput and memory usage	Use smaller batches (10–50 files) for large images and larger batches (100–500 files) for small text files.
Max retries	When data quality varies	Higher retries (5-10) for noisy data; lower retries (1-3) for clean data
Timeout	When processing time varies by data type	Longer timeout (300s) for complex models; shorter timeout (30s) for simple models
Error threshold	When you need different failure tolerance levels	Strict threshold (-1) for critical jobs; lenient threshold (10%) for experimental jobs

How to override settings

JOB_NAME=$(az ml batch-endpoint invoke --name $ENDPOINT_NAME --input https://azuremlexampledata.blob.core.windows.net/data/mnist/sample --mini-batch-size 20 --instance-count 5 --query name -o tsv)

job = ml_client.batch_endpoints.invoke(
    endpoint_name=endpoint_name,
    input=Input(path="./data"),
    params_override=[{"mini_batch_size": "20"}, {"compute.instance_count": "5"}],
)

Add deployments to an endpoint

Once you have a batch endpoint with a deployment, you can continue to refine your model and add new deployments. Batch endpoints will continue serving the default deployment while you develop and deploy new models under the same endpoint. Deployments don't affect one another.

In this example, you add a second deployment that uses a model built with Keras and TensorFlow to solve the same MNIST problem.

Add a second deployment

Create an environment for your batch deployment. Include any dependencies your code needs to run. Add the library azureml-core, as it's required for batch deployments. The following environment definition includes the required libraries to run a model with TensorFlow.
- Azure CLI
- Python
- Studio
The environment definition is included in the deployment definition itself as an anonymous environment.
```
environment:
  name: batch-tensorflow-py38
  image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
  conda_file: environment/conda.yaml
```
Get a reference to the environment:
```
env = Environment(
    name="batch-tensorflow-py38",
    conda_file="deployment-keras/environment/conda.yaml",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest",
)
```
1. Navigate to the Environments tab on the side menu.
2. Select the tab Custom environments > Create.
3. Enter the name of the environment, in this case keras-batch-env.
4. For Select environment source, select Use existing docker image with optional conda file.
5. For Container registry image path, enter mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04.
6. Select Next to go to the "Customize" section.
Copy the content of the file deployment-keras/environment/conda.yaml from the GitHub repo into the portal.
1. Select Next until you get to the "Review page".
2. Select Create and wait until the environment is ready for use.
The conda file used looks as follows:

deployment-keras/environment/conda.yaml
```
name: tensorflow-env
channels:
  - conda-forge
dependencies:
  - python=3.8.5
  - pip
  - pip:
    - pandas
    - tensorflow
    - pillow
    - azureml-core
    - azureml-dataset-runtime[fuse]
```

Create a scoring script for the model:

deployment-keras/code/batch_driver.py

import os
import numpy as np
import pandas as pd
import tensorflow as tf
from typing import List
from os.path import basename
from PIL import Image
from tensorflow.keras.models import load_model


def init():
    global model

    # AZUREML_MODEL_DIR is an environment variable created during deployment
    model_path = os.path.join(os.environ["AZUREML_MODEL_DIR"], "model")

    # load the model
    model = load_model(model_path)


def run(mini_batch: List[str]) -> pd.DataFrame:
    print(f"Executing run method over batch of {len(mini_batch)} files.")

    results = []
    for image_path in mini_batch:
        data = Image.open(image_path)
        data = np.array(data)
        data_batch = tf.expand_dims(data, axis=0)

        # perform inference
        pred = model.predict(data_batch)

        # Compute probabilities, classes and labels
        pred_prob = tf.math.reduce_max(tf.math.softmax(pred, axis=-1)).numpy()
        pred_class = tf.math.argmax(pred, axis=-1).numpy()

        results.append(
            {
                "file": basename(image_path),
                "class": pred_class[0],
                "probability": pred_prob,
            }
        )

    return pd.DataFrame(results)

Create a deployment definition
- Azure CLI
- Python
- Studio
deployment-keras/deployment.yml
```
$schema: https://azuremlschemas.azureedge.net/latest/modelBatchDeployment.schema.json
name: mnist-keras-dpl
description: A deployment using Keras with TensorFlow to solve the MNIST classification dataset.
endpoint_name: mnist-batch
type: model
model: 
  name: mnist-classifier-keras
  path: model
code_configuration:
  code: code
  scoring_script: batch_driver.py
environment:
  name: batch-tensorflow-py38
  image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
  conda_file: environment/conda.yaml
compute: azureml:batch-cluster
resources:
  instance_count: 1
settings:
  max_concurrency_per_instance: 2
  mini_batch_size: 10
  output_action: append_row
  output_file_name: predictions.csv
```
```
deployment_keras = ModelBatchDeployment(
    name="mnist-keras-dpl",
    description="A deployment using Keras to solve the MNIST classification dataset.",
    endpoint_name=endpoint_name,
    model=model,
    code_configuration=CodeConfiguration(
        code="deployment-keras/code/", scoring_script="batch_driver.py"
    ),
    environment=env,
    compute=compute_name,
    settings=ModelBatchDeploymentSettings(
        instance_count=2,
        max_concurrency_per_instance=2,
        mini_batch_size=10,
        output_action=BatchDeploymentOutputAction.APPEND_ROW,
        output_file_name="predictions.csv",
        retry_settings=BatchRetrySettings(max_retries=3, timeout=30),
        logging_level="info",
    ),
)
```
1. Navigate to the Endpoints tab on the side menu.
2. Select the tab Batch endpoints.
3. Select the existing batch endpoint where you want to add the deployment.
4. Select Add deployment.
5. Select Next to go to the "Model" page.
6. From the model list, select the model mnist and select Next.
7. On the deployment configuration page, give the deployment a name.
8. Undo the selection for the option: Make this new deployment the default for batch jobs.
9. For Output action, ensure Append row is selected.
10. For Output file name, ensure the batch scoring output file is the one you need. Default is predictions.csv.
11. For Mini batch size, adjust the size of the files that will be included in each mini-batch. This controls the amount of data your scoring script receives for each batch.
12. For Scoring timeout (seconds), ensure you're giving enough time for your deployment to score a given batch of files. If you increase the number of files, you usually have to increase the timeout value too. More expensive models (like those based on deep learning) might require high values in this field.
13. For Max concurrency per instance, configure the number of executors you want to have for each compute instance you get in the deployment. A higher number here guarantees a higher degree of parallelization but it also increases the memory pressure on the compute instance. Tune this value altogether with Mini batch size.
Select Next to proceed to the "Code + environment" page.
1. For Select a scoring script for inferencing, browse to select the scoring script file deployment-keras/code/batch_driver.py.
2. For Select environment, select the environment you created in a previous step.
3. Select Next.
4. On the Compute page, select the compute cluster you created in a previous step.
5. For Instance count, enter the number of compute instances you want for the deployment. In this case, use 2.
6. Select Next.
Create the deployment:
- Azure CLI
- Python
- Studio
Run the following code to create a batch deployment under the batch endpoint and set it as the default deployment.
```
az ml batch-deployment create --file deployment-keras/deployment.yml --endpoint-name $ENDPOINT_NAME
```
Tip

The --set-default parameter is missing in this case. As a best practice for production scenarios, create a new deployment without setting it as default. Then verify it, and update the default deployment later.
Using the MLClient created earlier, create the deployment in the workspace. This command starts the deployment creation and returns a confirmation response while the deployment creation continues.
```
ml_client.begin_create_or_update(deployment_keras).result()
```
In the wizard, select Create to start the deployment process.

Test a non-default batch deployment

To test the new non-default deployment, you need to know the name of the deployment you want to run.

DEPLOYMENT_NAME="mnist-keras-dpl"
JOB_NAME=$(az ml batch-endpoint invoke --name $ENDPOINT_NAME --deployment-name $DEPLOYMENT_NAME --input $SAMPLE_INPUT_URI --input-type uri_folder --query name -o tsv)

Notice --deployment-name is used to specify the deployment to execute. This parameter allows you to invoke a non-default deployment without updating the default deployment of the batch endpoint.

job = ml_client.batch_endpoints.invoke(
    endpoint_name=endpoint_name,
    deployment_name=deployment_keras.name,
    input=Input(
        path=Path(temp_data_path) / "data",
        type=AssetTypes.URI_FOLDER,
    ),
)

Notice deployment_name is used to specify the deployment to execute. This parameter allows you to invoke a non-default deployment without updating the default deployment of the batch endpoint.

Update the default batch deployment

Although you can invoke a specific deployment inside an endpoint, you'll typically want to invoke the endpoint itself and let the endpoint decide which deployment to use—the default deployment. You can change the default deployment (and consequently, change the model serving the deployment) without changing your contract with the user invoking the endpoint. Use the following code to update the default deployment:

az ml batch-endpoint update --name $ENDPOINT_NAME --set defaults.deployment_name=$DEPLOYMENT_NAME

endpoint = ml_client.batch_endpoints.get(endpoint_name)
endpoint.defaults.deployment_name = deployment_keras.name
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

Delete the batch endpoint and deployment

If you don't need the old batch deployment, delete it by running the following code. The --yes flag confirms the deletion.

az ml batch-deployment delete --name mnist-torch-dpl --endpoint-name $ENDPOINT_NAME --yes

Run the following code to delete the batch endpoint and its underlying deployments. Batch scoring jobs aren't deleted.

az ml batch-endpoint delete --name $ENDPOINT_NAME --yes

If you won't be using the old batch deployment, delete it by running the following code.

ml_client.batch_deployments.begin_delete(
    endpoint_name=endpoint_name, name=deployment.name
).result()

Run the following code to delete the batch endpoint and all its underlying deployments. Batch scoring jobs won't be deleted.

ml_client.batch_endpoints.begin_delete(name=endpoint_name)

Feedback

Was this page helpful?

Last updated on 2025-08-04

Share via

Deploy models for scoring in batch endpoints

Prerequisites

Clone the examples repository

Prepare your system

Connect to your workspace

Create compute

Create a batch endpoint

Create a batch deployment

Run batch endpoints and access results

Understanding the data flow

Invoke a batch endpoint

Monitor batch job execution progress

Check batch scoring results

Configure the output location

Overwrite deployment configuration for each job

Which settings can be overridden?

How to override settings

Add deployments to an endpoint

Add a second deployment

Test a non-default batch deployment

Update the default batch deployment

Delete the batch endpoint and deployment

Related content

Feedback

Additional resources