Safe rollout for online endpoints
APPLIES TO:
Azure CLI ml extension v2 (current)
APPLIES TO:
Python SDK azure-ai-ml v2 (current)
In this article, you'll learn how to deploy a new version of a machine learning model in production without causing any disruption. You'll use blue-green deployment, also known as a safe rollout strategy, to introduce a new version of a web service to production. This strategy will allow you to roll out your new version of the web service to a small subset of users or requests before rolling it out completely.
This article assumes you're using online endpoints, that is, endpoints that are used for online (real-time) inferencing. There are two types of online endpoints: managed online endpoints and Kubernetes online endpoints. For more information on endpoints and the differences between managed online endpoints and Kubernetes online endpoints, see What are Azure Machine Learning endpoints?.
Note
The main example in this article uses managed online endpoints for deployment. To use Kubernetes endpoints instead, see the notes in this document inline with the managed online endpoints discussion.
In this article, you'll learn to:
- Define an online endpoint and a deployment called "blue" to serve version 1 of a model
- Scale the blue deployment so that it can handle more requests
- Deploy version 2 of the model (called the "green" deployment) to the endpoint, but send the deployment no live traffic
- Test the green deployment in isolation
- Mirror a percentage of live traffic to the green deployment to validate it (preview)
- Send a small percentage of live traffic to the green deployment
- Send over all live traffic to the green deployment
- Delete the now-unused v1 blue deployment
Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning.
The Azure CLI and the
ml
extension to the Azure CLI. For more information, see Install, set up, and use the CLI (v2).Important
The CLI examples in this article assume that you are using the Bash (or compatible) shell. For example, from a Linux system or Windows Subsystem for Linux.
An Azure Machine Learning workspace. If you don't have one, use the steps in the Install, set up, and use the CLI (v2) to create one.
Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the owner or contributor role for the Azure Machine Learning workspace, or a custom role allowing
Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*
. For more information, see Manage access to an Azure Machine Learning workspace.If you haven't already set the defaults for the Azure CLI, save your default settings. To avoid passing in the values for your subscription, workspace, and resource group multiple times, run this code:
az account set --subscription <subscription id> az configure --defaults workspace=<azureml workspace name> group=<resource group>
(Optional) To deploy locally, you must install Docker Engine on your local computer. We highly recommend this option, so it's easier to debug issues.
Prepare your system
Clone the examples repository
To follow along with this article, first clone the examples repository (azureml-examples). Then, go to the repository's cli/
directory:
git clone --depth 1 https://github.com/Azure/azureml-examples
cd azureml-examples
cd cli
Tip
Use --depth 1
to clone only the latest commit to the repository. This reduces the time to complete the operation.
The commands in this tutorial are in the file deploy-safe-rollout-online-endpoints.sh
in the cli
directory, and the YAML configuration files are in the endpoints/online/managed/sample/
subdirectory.
Note
The YAML configuration files for Kubernetes online endpoints are in the endpoints/online/kubernetes/
subdirectory.
Define the endpoint and deployment
Online endpoints are used for online (real-time) inferencing. Online endpoints contain deployments that are ready to receive data from clients and can send responses back in real time.
Create online endpoint
To create an online endpoint:
Set your endpoint name:
For Unix, run this command (replace
YOUR_ENDPOINT_NAME
with a unique name):export ENDPOINT_NAME="<YOUR_ENDPOINT_NAME>"
Important
Endpoint names must be unique within an Azure region. For example, in the Azure
westus2
region, there can be only one endpoint with the namemy-endpoint
.Create the endpoint in the cloud, run the following code:
az ml online-endpoint create --name $ENDPOINT_NAME -f endpoints/online/managed/sample/endpoint.yml
Create the 'blue' deployment
A deployment is a set of resources required for hosting the model that does the actual inferencing. To create a deployment named blue
for your endpoint, run the following command:
az ml online-deployment create --name blue --endpoint-name $ENDPOINT_NAME -f endpoints/online/managed/sample/blue-deployment.yml --all-traffic
Confirm your existing deployment
You can view the status of your existing endpoint and deployment by running:
az ml online-endpoint show --name $ENDPOINT_NAME
az ml online-deployment show --name blue --endpoint $ENDPOINT_NAME
You should see the endpoint identified by $ENDPOINT_NAME
and, a deployment called blue
.
Test the endpoint with sample data
The endpoint can be invoked using the invoke
command. We'll send a sample request using a json file.
az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file endpoints/online/model-1/sample-request.json
Scale your existing deployment to handle more traffic
In the deployment described in Deploy and score a machine learning model with an online endpoint, you set the instance_count
to the value 1
in the deployment yaml file. You can scale out using the update
command:
az ml online-deployment update --name blue --endpoint-name $ENDPOINT_NAME --set instance_count=2
Note
Notice that in the above command we use --set
to override the deployment configuration. Alternatively you can update the yaml file and pass it as an input to the update
command using the --file
input.
Deploy a new model, but send it no traffic yet
Create a new deployment named green
:
az ml online-deployment create --name green --endpoint-name $ENDPOINT_NAME -f endpoints/online/managed/sample/green-deployment.yml
Since we haven't explicitly allocated any traffic to green
, it will have zero traffic allocated to it. You can verify that using the command:
az ml online-endpoint show -n $ENDPOINT_NAME --query traffic
Test the new deployment
Though green
has 0% of traffic allocated, you can invoke it directly by specifying the --deployment
name:
az ml online-endpoint invoke --name $ENDPOINT_NAME --deployment-name green --request-file endpoints/online/model-2/sample-request.json
If you want to use a REST client to invoke the deployment directly without going through traffic rules, set the following HTTP header: azureml-model-deployment: <deployment-name>
. The below code snippet uses curl
to invoke the deployment directly. The code snippet should work in Unix/WSL environments:
# get the scoring uri
SCORING_URI=$(az ml online-endpoint show -n $ENDPOINT_NAME -o tsv --query scoring_uri)
# use curl to invoke the endpoint
curl --request POST "$SCORING_URI" --header "Authorization: Bearer $ENDPOINT_KEY" --header 'Content-Type: application/json' --header "azureml-model-deployment: green" --data @endpoints/online/model-2/sample-request.json
Test the deployment with mirrored traffic (preview)
Important
This feature is currently in public preview. This preview version is provided without a service-level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Once you've tested your green
deployment, you can 'mirror' (or copy) a percentage of the live traffic to it. Mirroring traffic (also called shadowing) doesn't change the results returned to clients. Requests still flow 100% to the blue
deployment. The mirrored percentage of the traffic is copied and submitted to the green
deployment so you can gather metrics and logging without impacting your clients. Mirroring is useful when you want to validate a new deployment without impacting clients; for example, to check if latency is within acceptable bounds and that there are no HTTP errors. Testing the new deployment with traffic mirroring/shadowing is also known as shadow testing. The deployment receiving the mirrored traffic (in this case, the green
deployment) can also be called the shadow deployment.
Warning
Mirroring traffic uses your endpoint bandwidth quota (default 5 MBPS). Your endpoint bandwidth will be throttled if you exceed the allocated quota. For information on monitoring bandwidth throttling, see Monitor managed online endpoints.
Important
Mirrored traffic is supported for the CLI (v2) (version 2.4.0 or above) and Python SDK (v2) (version 1.0.0 or above). If you update the endpoint using an older version of CLI/SDK or Studio UI, the setting for mirrored traffic will be removed.
The following command mirrors 10% of the traffic to the green
deployment:
az ml online-endpoint update --name $ENDPOINT_NAME --mirror-traffic "green=10"
You can test mirror traffic by invoking the endpoint several times:
for i in {1..20} ; do
az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file endpoints/online/model-1/sample-request.json
done
Mirroring has the following limitations:
- You can only mirror traffic to one deployment.
- Mirror traffic isn't currently supported for Kubernetes online endpoints.
- The maximum mirrored traffic you can configure is 50%. This limit is to reduce the impact on your endpoint bandwidth quota.
Also note the following behavior:
- A deployment can only be set to live or mirror traffic, not both.
- You can send traffic directly to the mirror deployment by specifying the deployment set for mirror traffic.
- You can send traffic directly to a live deployment by specifying the deployment set for live traffic, but in this case the traffic won't be mirrored to the mirror deployment. Mirror traffic is routed from traffic sent to endpoint without specifying the deployment.
Tip
You can use --deployment-name
option for CLI v2, or deployment_name
option for SDK v2 to specify the deployment to be routed to.
You can confirm that the specific percentage of the traffic was sent to the green
deployment by seeing the logs from the deployment:
az ml online-deployment get-logs --name blue --endpoint $ENDPOINT_NAME
After testing, you can set the mirror traffic to zero to disable mirroring:
az ml online-endpoint update --name $ENDPOINT_NAME --mirror-traffic "green=0"
Test the new deployment with a small percentage of live traffic
Once you've tested your green
deployment, allocate a small percentage of traffic to it:
az ml online-endpoint update --name $ENDPOINT_NAME --traffic "blue=90 green=10"
Now, your green
deployment will receive 10% of requests.
Send all traffic to your new deployment
Once you're fully satisfied with your green
deployment, switch all traffic to it.
az ml online-endpoint update --name $ENDPOINT_NAME --traffic "blue=0 green=100"
Remove the old deployment
az ml online-deployment delete --name blue --endpoint $ENDPOINT_NAME --yes --no-wait
Delete the endpoint and deployment
If you aren't going use the deployment, you should delete it with:
az ml online-endpoint delete --name $ENDPOINT_NAME --yes --no-wait
Next steps
- Explore online endpoint samples
- Deploy models with REST
- Create and use online endpoints in the studio
- Access Azure resources with a online endpoint and managed identity
- Monitor managed online endpoints
- Manage and increase quotas for resources with Azure Machine Learning
- View costs for an Azure Machine Learning managed online endpoint
- Managed online endpoints SKU list
- Troubleshooting online endpoints deployment and scoring
- Online endpoint YAML reference
Feedback
Submit and view feedback for