Perform safe rollout of new deployments for real-time inference

Article
10/24/2023

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

In this article, you'll learn how to deploy a new version of a machine learning model in production without causing any disruption. You'll use a blue-green deployment strategy (also known as a safe rollout strategy) to introduce a new version of a web service to production. This strategy will allow you to roll out your new version of the web service to a small subset of users or requests before rolling it out completely.

This article assumes you're using online endpoints, that is, endpoints that are used for online (real-time) inferencing. There are two types of online endpoints: managed online endpoints and Kubernetes online endpoints. For more information on endpoints and the differences between managed online endpoints and Kubernetes online endpoints, see What are Azure Machine Learning endpoints?.

The main example in this article uses managed online endpoints for deployment. To use Kubernetes endpoints instead, see the notes in this document that are inline with the managed online endpoint discussion.

In this article, you'll learn to:

Define an online endpoint with a deployment called "blue" to serve version 1 of a model
Scale the blue deployment so that it can handle more requests
Deploy version 2 of the model (called the "green" deployment) to the endpoint, but send the deployment no live traffic
Test the green deployment in isolation
Mirror a percentage of live traffic to the green deployment to validate it
Send a small percentage of live traffic to the green deployment
Send over all live traffic to the green deployment
Delete the now-unused v1 blue deployment

Prerequisites

Before following the steps in this article, make sure you have the following prerequisites:

The Azure CLI and the ml extension to the Azure CLI. For more information, see Install, set up, and use the CLI (v2).

Important

The CLI examples in this article assume that you are using the Bash (or compatible) shell. For example, from a Linux system or Windows Subsystem for Linux.
An Azure Machine Learning workspace. If you don't have one, use the steps in the Install, set up, and use the CLI (v2) to create one.

Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the owner or contributor role for the Azure Machine Learning workspace, or a custom role allowing Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*. For more information, see Manage access to an Azure Machine Learning workspace.
(Optional) To deploy locally, you must install Docker Engine on your local computer. We highly recommend this option, so it's easier to debug issues.

APPLIES TO: Python SDK azure-ai-ml v2 (current)

Before following the steps in this article, make sure you have the following prerequisites:

An Azure Machine Learning workspace. If you don't have one, use the steps in the Quickstart: Create workspace resources article to create one.
To install the Python SDK v2, use the following command:
```
pip install azure-ai-ml azure-identity
```
To update an existing installation of the SDK to the latest version, use the following command:
```
pip install --upgrade azure-ai-ml azure-identity
```
For more information, see Install the Python SDK v2 for Azure Machine Learning.

Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the owner or contributor role for the Azure Machine Learning workspace, or a custom role allowing Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*. For more information, see Manage access to an Azure Machine Learning workspace.
(Optional) To deploy locally, you must install Docker Engine on your local computer. We highly recommend this option, so it's easier to debug issues.

Before following the steps in this article, make sure you have the following prerequisites:

An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning.
An Azure Machine Learning workspace and a compute instance. If you don't have these, use the steps in the Quickstart: Create workspace resources article to create them.
Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the owner or contributor role for the Azure Machine Learning workspace, or a custom role allowing Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*. For more information, see Manage access to an Azure Machine Learning workspace.

Prepare your system

Set environment variables

If you haven't already set the defaults for the Azure CLI, save your default settings. To avoid passing in the values for your subscription, workspace, and resource group multiple times, run this code:

az account set --subscription <subscription id>
az configure --defaults workspace=<Azure Machine Learning workspace name> group=<resource group>

Clone the examples repository

To follow along with this article, first clone the examples repository (azureml-examples). Then, go to the repository's cli/ directory:

git clone --depth 1 https://github.com/Azure/azureml-examples
cd azureml-examples
cd cli

Tip

Use --depth 1 to clone only the latest commit to the repository. This reduces the time to complete the operation.

The commands in this tutorial are in the file deploy-safe-rollout-online-endpoints.sh in the cli directory, and the YAML configuration files are in the endpoints/online/managed/sample/ subdirectory.

Note

The YAML configuration files for Kubernetes online endpoints are in the endpoints/online/kubernetes/ subdirectory.

Clone the examples repository

To run the training examples, first clone the examples repository (azureml-examples). Then, go into the azureml-examples/sdk/python/endpoints/online/managed directory:

git clone --depth 1 https://github.com/Azure/azureml-examples
cd azureml-examples/sdk/python/endpoints/online/managed

Tip

Use --depth 1 to clone only the latest commit to the repository. This reduces the time to complete the operation.

The information in this article is based on the online-endpoints-safe-rollout.ipynb notebook. It contains the same content as this article, although the order of the codes is slightly different.

Note

The steps for the Kubernetes online endpoint are based on the kubernetes-online-endpoints-safe-rollout.ipynb notebook.

Connect to Azure Machine Learning workspace

The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section, we'll connect to the workspace where you'll perform deployment tasks. To follow along, open your online-endpoints-safe-rollout.ipynb notebook.

Import the required libraries:

# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
    Environment,
    CodeConfiguration,
)
from azure.identity import DefaultAzureCredential

Note

If you're using the Kubernetes online endpoint, import the KubernetesOnlineEndpoint and KubernetesOnlineDeployment class from the azure.ai.ml.entities library.

Configure workspace details and get a handle to the workspace:

To connect to a workspace, we need identifier parameters—a subscription, resource group and workspace name. We'll use these details in the MLClient from azure.ai.ml to get a handle to the required Azure Machine Learning workspace. This example uses the default Azure authentication.
```
# enter details of your AML workspace
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace = "<AML_WORKSPACE_NAME>"
```
```
# get a handle to the workspace
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)
```

If you have Git installed on your local machine, you can follow the instructions to clone the examples repository. Otherwise, follow the instructions to download files from the examples repository.

Clone the examples repository

To follow along with this article, first clone the examples repository (azureml-examples) and then change into the azureml-examples/cli/endpoints/online/model-1 directory.

git clone --depth 1 https://github.com/Azure/azureml-examples
cd azureml-examples/cli/endpoints/online/model-1

Tip

Use --depth 1 to clone only the latest commit to the repository, which reduces time to complete the operation.

Download files from the examples repository

If you cloned the examples repo, your local machine already has copies of the files for this example, and you can skip to the next section. If you didn't clone the repo, you can download it to your local machine.

Go to https://github.com/Azure/azureml-examples/.
Go to the <> Code button on the page, and then select Download ZIP from the Local tab.
Locate the model folder /cli/endpoints/online/model-1/model and scoring script /cli/endpoints/online/model-1/onlinescoring/score.py for a first model model-1.
Locate the model folder /cli/endpoints/online/model-2/model and scoring script /cli/endpoints/online/model-2/onlinescoring/score.py for a second model model-2.

Define the endpoint and deployment

Online endpoints are used for online (real-time) inferencing. Online endpoints contain deployments that are ready to receive data from clients and send responses back in real time.

Define an endpoint

The following table lists key attributes to specify when you define an endpoint.

Attribute	Description
Name	Required. Name of the endpoint. It must be unique in the Azure region. For more information on the naming rules, see endpoint limits.
Authentication mode	The authentication method for the endpoint. Choose between key-based authentication `key` and Azure Machine Learning token-based authentication `aml_token`. A key doesn't expire, but a token does expire. For more information on authenticating, see Authenticate to an online endpoint.
Description	Description of the endpoint.
Tags	Dictionary of tags for the endpoint.
Traffic	Rules on how to route traffic across deployments. Represent the traffic as a dictionary of key-value pairs, where key represents the deployment name and value represents the percentage of traffic to that deployment. You can set the traffic only when the deployments under an endpoint have been created. You can also update the traffic for an online endpoint after the deployments have been created. For more information on how to use mirrored traffic, see Allocate a small percentage of live traffic to the new deployment.
Mirror traffic	Percentage of live traffic to mirror to a deployment. For more information on how to use mirrored traffic, see Test the deployment with mirrored traffic.

To see a full list of attributes that you can specify when you create an endpoint, see CLI (v2) online endpoint YAML schema or SDK (v2) ManagedOnlineEndpoint Class.

Define a deployment

A deployment is a set of resources required for hosting the model that does the actual inferencing. The following table describes key attributes to specify when you define a deployment.

Attribute	Description
Name	Required. Name of the deployment.
Endpoint name	Required. Name of the endpoint to create the deployment under.
Model	The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification. In the example, we have a scikit-learn model that does regression.
Code path	The path to the directory on the local development environment that contains all the Python source code for scoring the model. You can use nested directories and packages.
Scoring script	Python code that executes the model on a given input request. This value can be the relative path to the scoring file in the source code directory. The scoring script receives data submitted to a deployed web service and passes it to the model. The script then executes the model and returns its response to the client. The scoring script is specific to your model and must understand the data that the model expects as input and returns as output. In this example, we have a score.py file. This Python code must have an `init()` function and a `run()` function. The `init()` function will be called after the model is created or updated (you can use it to cache the model in memory, for example). The `run()` function is called at every invocation of the endpoint to do the actual scoring and prediction.
Environment	Required. The environment to host the model and code. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification. The environment can be a Docker image with Conda dependencies, a Dockerfile, or a registered environment.
Instance type	Required. The VM size to use for the deployment. For the list of supported sizes, see Managed online endpoints SKU list.
Instance count	Required. The number of instances to use for the deployment. Base the value on the workload you expect. For high availability, we recommend that you set the value to at least `3`. We reserve an extra 20% for performing upgrades. For more information, see limits for online endpoints.

To see a full list of attributes that you can specify when you create a deployment, see CLI (v2) managed online deployment YAML schema or SDK (v2) ManagedOnlineDeployment Class.

Create online endpoint

First set the endpoint's name and then configure it. In this article, you'll use the endpoints/online/managed/sample/endpoint.yml file to configure the endpoint. The following snippet shows the contents of the file:

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-endpoint
auth_mode: key

The reference for the endpoint YAML format is described in the following table. To learn how to specify these attributes, see the online endpoint YAML reference. For information about limits related to managed online endpoints, see limits for online endpoints.

Key	Description
`$schema`	(Optional) The YAML schema. To see all available options in the YAML file, you can view the schema in the preceding code snippet in a browser.
`name`	The name of the endpoint.
`auth_mode`	Use `key` for key-based authentication. Use `aml_token` for Azure Machine Learning token-based authentication. To get the most recent token, use the `az ml online-endpoint get-credentials` command.

To create an online endpoint:

Set your endpoint name:

For Unix, run this command (replace YOUR_ENDPOINT_NAME with a unique name):
```
export ENDPOINT_NAME="<YOUR_ENDPOINT_NAME>"
```
Important

Endpoint names must be unique within an Azure region. For example, in the Azure westus2 region, there can be only one endpoint with the name my-endpoint.
Create the endpoint in the cloud:

Run the following code to use the endpoint.yml file to configure the endpoint:
```
az ml online-endpoint create --name $ENDPOINT_NAME -f endpoints/online/managed/sample/endpoint.yml
```

Create the 'blue' deployment

In this article, you'll use the endpoints/online/managed/sample/blue-deployment.yml file to configure the key aspects of the deployment. The following snippet shows the contents of the file:

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
endpoint_name: my-endpoint
model:
  path: ../../model-1/model/
code_configuration:
  code: ../../model-1/onlinescoring/
  scoring_script: score.py
environment: 
  conda_file: ../../model-1/environment/conda.yaml
  image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
instance_type: Standard_DS3_v2
instance_count: 1

To create a deployment named blue for your endpoint, run the following command to use the blue-deployment.yml file to configure

az ml online-deployment create --name blue --endpoint-name $ENDPOINT_NAME -f endpoints/online/managed/sample/blue-deployment.yml --all-traffic

Important

The --all-traffic flag in the az ml online-deployment create allocates 100% of the endpoint traffic to the newly created blue deployment.

In the blue-deployment.yaml file, we specify the path (where to upload files from) inline. The CLI automatically uploads the files and registers the model and environment. As a best practice for production, you should register the model and environment and specify the registered name and version separately in the YAML. Use the form model: azureml:my-model:1 or environment: azureml:my-env:1.

For registration, you can extract the YAML definitions of model and environment into separate YAML files and use the commands az ml model create and az ml environment create. To learn more about these commands, run az ml model create -h and az ml environment create -h.

For more information on registering your model as an asset, see Register your model as an asset in Machine Learning by using the CLI. For more information on creating an environment, see Manage Azure Machine Learning environments with the CLI & SDK (v2).

Create online endpoint

To create a managed online endpoint, use the ManagedOnlineEndpoint class. This class allows users to configure the key aspects of the endpoint.

Configure the endpoint:

# Creating a unique endpoint name with current datetime to avoid conflicts
import random

online_endpoint_name = "endpt-moe-" + str(random.randint(0, 10000))

# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="this is a sample online endpoint",
    auth_mode="key",
    tags={"foo": "bar"},
)

Note

To create a Kubernetes online endpoint, use the KubernetesOnlineEndpoint class.

Create the endpoint:

ml_client.online_endpoints.begin_create_or_update(endpoint).result()

Create the 'blue' deployment

To create a deployment for your managed online endpoint, use the ManagedOnlineDeployment class. This class allows users to configure the key aspects of the deployment. The following table describes the attributes of a deployment:

Configure blue deployment:
```
# create blue deployment
model = Model(path="../model-1/model/sklearn_regression_model.pkl")
env = Environment(
    conda_file="../model-1/environment/conda.yaml",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
)

blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=online_endpoint_name,
    model=model,
    environment=env,
    code_configuration=CodeConfiguration(
        code="../model-1/onlinescoring", scoring_script="score.py"
    ),
    instance_type="Standard_DS3_v2",
    instance_count=1,
)
```
In this example, we specify the path (where to upload files from) inline. The SDK automatically uploads the files and registers the model and environment. As a best practice for production, you should register the model and environment and specify the registered name and version separately in the codes.

For more information on registering your model as an asset, see Register your model as an asset in Machine Learning by using the SDK.

For more information on creating an environment, see Manage Azure Machine Learning environments with the CLI & SDK (v2).

Note

To create a deployment for a Kubernetes online endpoint, use the KubernetesOnlineDeployment class.

Create the deployment:

ml_client.online_deployments.begin_create_or_update(blue_deployment).result()

# blue deployment takes 100 traffic
endpoint.traffic = {"blue": 100}
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

When you create a managed online endpoint in the Azure Machine Learning studio, you must define an initial deployment for the endpoint. Before you can define a deployment, you must have a registered model in your workspace. Let's begin by registering the model to use for the deployment.

Register your model

A model registration is a logical entity in the workspace. This entity can contain a single model file or a directory of multiple files. As a best practice for production, you should register the model and environment. When creating the endpoint and deployment in this article, we'll assume that you've registered the model folder that contains the model.

To register the example model, follow these steps:

Go to the Azure Machine Learning studio.
In the left navigation bar, select the Models page.
Select Register, and then choose From local files.
Select Unspecified type for the Model type.
Select Browse, and choose Browse folder.
Select the \azureml-examples\cli\endpoints\online\model-1\model folder from the local copy of the repo you cloned or downloaded earlier. When prompted, select Upload and wait for the upload to complete.
Select Next after the folder upload is completed.
Enter a friendly Name for the model. The steps in this article assume the model is named model-1.
Select Next, and then Register to complete registration.
Repeat the previous steps to register a model-2 from the \azureml-examples\cli\endpoints\online\model-2\model folder in the local copy of the repo you cloned or downloaded earlier.

For more information on working with registered models, see Register and work with models.

For information on creating an environment in the studio, see Create an environment.

Create a managed online endpoint and the 'blue' deployment

Use the Azure Machine Learning studio to create a managed online endpoint directly in your browser. When you create a managed online endpoint in the studio, you must define an initial deployment. You can't create an empty managed online endpoint.

One way to create a managed online endpoint in the studio is from the Models page. This method also provides an easy way to add a model to an existing managed online deployment. To deploy the model named model-1 that you registered previously in the Register your model section:

Go to the Azure Machine Learning studio.
In the left navigation bar, select the Models page.
Select the model named model-1 by checking the circle next to its name.
Select Deploy > Real-time endpoint.

This action opens up a window where you can specify details about your endpoint.
Enter an Endpoint name.
Keep the default selections: Managed for the compute type and key-based authentication for the authentication type.
Select Next, until you get to the "Deployment" page. Here, perform the following tasks:
- Name the deployment "blue".
- Check the box for Enable Application Insights diagnostics and data collection to allow you to view graphs of your endpoint's activities in the studio later.
Select Next to go to the "Environment" page. Here, perform following steps:
- In the "Select scoring file and dependencies" box, browse and select the \azureml-examples\cli\endpoints\online\model-1\onlinescoring\score.py file from the repo you cloned or downloaded earlier.
- Start typing sklearn in the search box above the list of environments, and select the AzureML-sklearn-0.24 curated environment.
Select Next to go to the "Compute" page. Here, keep the default selection for the virtual machine "Standard_DS3_v2" and change the Instance count to 1.
Select Next, to accept the default traffic allocation (100%) to the blue deployment.
Review your deployment settings and select the Create button.

Alternatively, you can create a managed online endpoint from the Endpoints page in the studio.

Go to the Azure Machine Learning studio.
In the left navigation bar, select the Endpoints page.
Select + Create.

This action opens up a window for you to specify details about your endpoint and deployment. Enter settings for your endpoint and deployment as described in the previous steps 5-11, accepting defaults until you're prompted to Create the deployment.

Confirm your existing deployment

One way to confirm your existing deployment is to invoke your endpoint so that it can score your model for a given input request. When you invoke your endpoint via the CLI or Python SDK, you can choose to specify the name of the deployment that will receive the incoming traffic.

Note

Unlike the CLI or Python SDK, Azure Machine Learning studio requires you to specify a deployment when you invoke an endpoint.

Invoke endpoint with deployment name

If you invoke the endpoint with the name of the deployment that will receive traffic, Azure Machine Learning will route the endpoint's traffic directly to the specified deployment and return its output. You can use the --deployment-name option for CLI v2, or deployment_name option for SDK v2 to specify the deployment.

Invoke endpoint without specifying deployment

If you invoke the endpoint without specifying the deployment that will receive traffic, Azure Machine Learning will route the endpoint's incoming traffic to the deployment(s) in the endpoint based on traffic control settings.

Traffic control settings allocate specified percentages of incoming traffic to each deployment in the endpoint. For example, if your traffic rules specify that a particular deployment in your endpoint will receive incoming traffic 40% of the time, Azure Machine Learning will route 40% of the endpoint's traffic to that deployment.

You can view the status of your existing endpoint and deployment by running:

az ml online-endpoint show --name $ENDPOINT_NAME 

az ml online-deployment show --name blue --endpoint $ENDPOINT_NAME

You should see the endpoint identified by $ENDPOINT_NAME and, a deployment called blue.

Test the endpoint with sample data

The endpoint can be invoked using the invoke command. We'll send a sample request using a json file.

az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file endpoints/online/model-1/sample-request.json

Check the status to see whether the model was deployed without error:

ml_client.online_endpoints.get(name=online_endpoint_name)

Test the endpoint with sample data

Using the MLClient created earlier, we'll get a handle to the endpoint. The endpoint can be invoked using the invoke command with the following parameters:

endpoint_name - Name of the endpoint
request_file - File with request data
deployment_name - Name of the specific deployment to test in an endpoint

We'll send a sample request using a json file.

# test the blue deployment with some sample data
ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="blue",
    request_file="../model-1/sample-request.json",
)

Scale your existing deployment to handle more traffic

In the deployment described in Deploy and score a machine learning model with an online endpoint, you set the instance_count to the value 1 in the deployment yaml file. You can scale out using the update command:

az ml online-deployment update --name blue --endpoint-name $ENDPOINT_NAME --set instance_count=2

Note

Notice that in the above command we use --set to override the deployment configuration. Alternatively you can update the yaml file and pass it as an input to the update command using the --file input.

Using the MLClient created earlier, we'll get a handle to the deployment. The deployment can be scaled by increasing or decreasing the instance_count.

# scale the deployment
blue_deployment = ml_client.online_deployments.get(
    name="blue", endpoint_name=online_endpoint_name
)
blue_deployment.instance_count = 2
ml_client.online_deployments.begin_create_or_update(blue_deployment).result()

Get endpoint details

# Get the details for online endpoint
endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)

# existing traffic details
print(endpoint.traffic)

# Get the scoring URI
print(endpoint.scoring_uri)

Deploy a new model, but send it no traffic yet

Create a new deployment named green:

az ml online-deployment create --name green --endpoint-name $ENDPOINT_NAME -f endpoints/online/managed/sample/green-deployment.yml

Since we haven't explicitly allocated any traffic to green, it has zero traffic allocated to it. You can verify that using the command:

az ml online-endpoint show -n $ENDPOINT_NAME --query traffic

Test the new deployment

Though green has 0% of traffic allocated, you can invoke it directly by specifying the --deployment name:

az ml online-endpoint invoke --name $ENDPOINT_NAME --deployment-name green --request-file endpoints/online/model-2/sample-request.json

If you want to use a REST client to invoke the deployment directly without going through traffic rules, set the following HTTP header: azureml-model-deployment: <deployment-name>. The below code snippet uses curl to invoke the deployment directly. The code snippet should work in Unix/WSL environments:

# get the scoring uri
SCORING_URI=$(az ml online-endpoint show -n $ENDPOINT_NAME -o tsv --query scoring_uri)
# use curl to invoke the endpoint
curl --request POST "$SCORING_URI" --header "Authorization: Bearer $ENDPOINT_KEY" --header 'Content-Type: application/json' --header "azureml-model-deployment: green" --data @endpoints/online/model-2/sample-request.json

Create a new deployment for your managed online endpoint and name the deployment green:

# create green deployment
model2 = Model(path="../model-2/model/sklearn_regression_model.pkl")
env2 = Environment(
    conda_file="../model-2/environment/conda.yaml",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
)

green_deployment = ManagedOnlineDeployment(
    name="green",
    endpoint_name=online_endpoint_name,
    model=model2,
    environment=env2,
    code_configuration=CodeConfiguration(
        code="../model-2/onlinescoring", scoring_script="score.py"
    ),
    instance_type="Standard_DS3_v2",
    instance_count=1,
)

# use MLClient to create green deployment
ml_client.online_deployments.begin_create_or_update(green_deployment).result()

Note

If you're creating a deployment for a Kubernetes online endpoint, use the KubernetesOnlineDeployment class and specify a Kubernetes instance type in your Kubernetes cluster.

Test the new deployment

Though green has 0% of traffic allocated, you can still invoke the endpoint and deployment with the json file.

ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="green",
    request_file="../model-2/sample-request.json",
)

Create a new deployment to add to your managed online endpoint and name the deployment green.

From the Endpoint details page

Select + Add Deployment button in the endpoint "Details" page.
Select Deploy a model.
Select Next to go to the "Model" page and select the model model-2.
Select Next to go to the "Deployment" page and perform the following tasks:
1. Name the deployment "green".
2. Enable application insights diagnostics and data collection.
Select Next to go to the "Environment" page. Here, perform following steps:
- In the "Select scoring file and dependencies" box, browse and select the \azureml-examples\cli\endpoints\online\model-2\onlinescoring\score.py file from the repo you cloned or downloaded earlier.
- Start typing sklearn in the search box above the list of environments, and select the AzureML-sklearn-0.24 curated environment.
Select Next to go to the "Compute" page. Here, keep the default selection for the virtual machine "Standard_DS3_v2" and change the Instance count to 1.
Select Next to go to the "Traffic" page. Here, keep the default traffic allocation to the deployments (100% traffic to "blue" and 0% traffic to "green").
Select Next to review your deployment settings.
Select Create to create the deployment.

Alternatively, you can use the Models page to add a deployment:

In the left navigation bar, select the Models page.
Select a model by checking the circle next to the model name.
Select Deploy > Real-time endpoint.
Choose to deploy to an existing managed online endpoint.
Follow the previous steps 3 to 9 to finish creating the green deployment.

Note

When adding a new deployment to an endpoint, you can adjust the traffic balance between deployments on the "Traffic" page. At this point, though, you should keep the default traffic allocation to the deployments (100% traffic to "blue" and 0% traffic to "green").

Test the new deployment

Though green has 0% of traffic allocated, you can still invoke the endpoint and deployment. Use the Test tab in the endpoint's details page to test your managed online deployment. Enter sample input and view the results.

Select the Test tab in the endpoint's detail page.
Select the green deployment from the dropdown menu.
Copy the sample input from the json file.
Paste the sample input in the test box.
Select Test.

Test the deployment with mirrored traffic

Once you've tested your green deployment, you can mirror (or copy) a percentage of the live traffic to it. Traffic mirroring (also called shadowing) doesn't change the results returned to clients—requests still flow 100% to the blue deployment. The mirrored percentage of the traffic is copied and submitted to the green deployment so that you can gather metrics and logging without impacting your clients. Mirroring is useful when you want to validate a new deployment without impacting clients. For example, you can use mirroring to check if latency is within acceptable bounds or to check that there are no HTTP errors. Testing the new deployment with traffic mirroring/shadowing is also known as shadow testing. The deployment receiving the mirrored traffic (in this case, the green deployment) can also be called the shadow deployment.

Mirroring has the following limitations:

Mirroring is supported for the CLI (v2) (version 2.4.0 or above) and Python SDK (v2) (version 1.0.0 or above). If you use an older version of CLI/SDK to update an endpoint, you'll lose the mirror traffic setting.
Mirroring isn't currently supported for Kubernetes online endpoints.
You can mirror traffic to only one deployment in an endpoint.
The maximum percentage of traffic you can mirror is 50%. This limit is to reduce the effect on your endpoint bandwidth quota (default 5 MBPS)—your endpoint bandwidth is throttled if you exceed the allocated quota. For information on monitoring bandwidth throttling, see Monitor managed online endpoints.

Also note the following behaviors:

A deployment can be configured to receive only live traffic or mirrored traffic, not both.
When you invoke an endpoint, you can specify the name of any of its deployments — even a shadow deployment — to return the prediction.
When you invoke an endpoint with the name of the deployment that will receive incoming traffic, Azure Machine Learning won't mirror traffic to the shadow deployment. Azure Machine Learning mirrors traffic to the shadow deployment from traffic sent to the endpoint when you don't specify a deployment.

Now, let's set the green deployment to receive 10% of mirrored traffic. Clients will still receive predictions from the blue deployment only.

Diagram showing 10% traffic mirrored to one deployment.

The following command mirrors 10% of the traffic to the green deployment:

az ml online-endpoint update --name $ENDPOINT_NAME --mirror-traffic "green=10"

You can test mirror traffic by invoking the endpoint several times without specifying a deployment to receive the incoming traffic:

for i in {1..20} ; do
    az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file endpoints/online/model-1/sample-request.json
done

You can confirm that the specific percentage of the traffic was sent to the green deployment by seeing the logs from the deployment:

az ml online-deployment get-logs --name blue --endpoint $ENDPOINT_NAME

After testing, you can set the mirror traffic to zero to disable mirroring:

az ml online-endpoint update --name $ENDPOINT_NAME --mirror-traffic "green=0"

The following command mirrors 10% of the traffic to the green deployment:

endpoint.mirror_traffic = {"green": 10}
ml_client.begin_create_or_update(endpoint).result()

You can test mirror traffic by invoking the endpoint several times without specifying a deployment to receive the incoming traffic:

# You can test mirror traffic by invoking the endpoint several times
for i in range(20):
    ml_client.online_endpoints.invoke(
        endpoint_name=online_endpoint_name,
        request_file="../model-1/sample-request.json",
    )

You can confirm that the specific percentage of the traffic was sent to the green deployment by seeing the logs from the deployment:

ml_client.online_deployments.get_logs(
    name="green", endpoint_name=online_endpoint_name, lines=50
)

After testing, you can set the mirror traffic to zero to disable mirroring:

endpoint.mirror_traffic = {"green": 0}
ml_client.begin_create_or_update(endpoint).result()

To mirror 10% of the traffic to the green deployment:

From the endpoint Details page, Select Update traffic.
Slide the button to Enable mirrored traffic.
Select the green deployment in the "Deployment name" dropdown menu.
Keep the default traffic allocation of 10%.
Select Update.

The endpoint details page now shows mirrored traffic allocation of 10% to the green deployment.

To test mirrored traffic, see the Azure CLI or Python tabs to invoke the endpoint several times. Confirm that the specific percentage of the traffic was sent to the green deployment by seeing the logs from the deployment. You can access the deployment logs from the endpoint's Deployment logs tab. You can also use Metrics and Logs to monitor performance of the mirrored traffic. For more information, see Monitor online endpoints.

After testing, you can disable mirroring:

From the endpoint Details page, Select Update traffic.
Slide the button next to Enable mirrored traffic again to disable mirrored traffic.
Select Update.

Allocate a small percentage of live traffic to the new deployment

Once you've tested your green deployment, allocate a small percentage of traffic to it:

az ml online-endpoint update --name $ENDPOINT_NAME --traffic "blue=90 green=10"

Once you've tested your green deployment, allocate a small percentage of traffic to it:

endpoint.traffic = {"blue": 90, "green": 10}
ml_client.begin_create_or_update(endpoint).result()

Tip

The total traffic percentage must sum to either 0% (to disable traffic) or 100% (to enable traffic).

Now, your green deployment receives 10% of all live traffic. Clients will receive predictions from both the blue and green deployments.

Diagram showing traffic split between deployments.

Send all traffic to your new deployment

Once you're fully satisfied with your green deployment, switch all traffic to it.

az ml online-endpoint update --name $ENDPOINT_NAME --traffic "blue=0 green=100"

Once you're fully satisfied with your green deployment, switch all traffic to it.

endpoint.traffic = {"blue": 0, "green": 100}
ml_client.begin_create_or_update(endpoint).result()

Remove the old deployment

Use the following steps to delete an individual deployment from a managed online endpoint. Deleting an individual deployment does affect the other deployments in the managed online endpoint:

az ml online-deployment delete --name blue --endpoint $ENDPOINT_NAME --yes --no-wait

ml_client.online_deployments.begin_delete(
    name="blue", endpoint_name=online_endpoint_name
).wait()

Delete the endpoint and deployment

If you aren't going to use the endpoint and deployment, you should delete them. By deleting the endpoint, you'll also delete all its underlying deployments.

az ml online-endpoint delete --name $ENDPOINT_NAME --yes --no-wait

If you aren't going to use the endpoint and deployment, you should delete them. By deleting the endpoint, you'll also delete all its underlying deployments.

ml_client.online_endpoints.begin_delete(name=online_endpoint_name)

Perform safe rollout of new deployments for real-time inference

Prerequisites

Prepare your system

Set environment variables

Clone the examples repository

Clone the examples repository

Connect to Azure Machine Learning workspace

Clone the examples repository

Download files from the examples repository

Define the endpoint and deployment

Define an endpoint

Define a deployment

Create online endpoint

Create the 'blue' deployment

Create online endpoint

Create the 'blue' deployment

Register your model

Create a managed online endpoint and the 'blue' deployment

Confirm your existing deployment

Invoke endpoint with deployment name

Invoke endpoint without specifying deployment

Test the endpoint with sample data

Test the endpoint with sample data

View managed online endpoints

Test the endpoint with sample data

Scale your existing deployment to handle more traffic

Get endpoint details

Deploy a new model, but send it no traffic yet

Test the new deployment

Test the new deployment

Test the new deployment

Test the deployment with mirrored traffic

Allocate a small percentage of live traffic to the new deployment

Send all traffic to your new deployment

Remove the old deployment

Delete the endpoint and deployment

Feedback

Additional resources

Perform safe rollout of new deployments for real-time inference

Prerequisites

Prepare your system

Set environment variables

Clone the examples repository

Define the endpoint and deployment

Define an endpoint

Define a deployment

Create online endpoint

Create the 'blue' deployment

Confirm your existing deployment

Invoke endpoint with deployment name

Invoke endpoint without specifying deployment

Test the endpoint with sample data

Scale your existing deployment to handle more traffic

Deploy a new model, but send it no traffic yet

Test the new deployment

Test the deployment with mirrored traffic

Allocate a small percentage of live traffic to the new deployment

Send all traffic to your new deployment

Remove the old deployment

Delete the endpoint and deployment

Related content

Feedback

Additional resources