Use network isolation with managed online endpoints
APPLIES TO:
Azure CLI ml extension v2 (current)
Python SDK azure-ai-ml v2 (current)
When deploying a machine learning model to a managed online endpoint, you can secure communication with the online endpoint by using private endpoints.
You can secure the inbound scoring requests from clients to an online endpoint. You can also secure the outbound communications between a deployment and the Azure resources it uses. Security for inbound and outbound communication are configured separately. For more information on endpoints and deployments, see What are endpoints and deployments.
The following diagram shows how communications flow through private endpoints to the managed online endpoint. Incoming scoring requests from clients are received through the workspace private endpoint from your virtual network. Outbound communication with services is handled through private endpoints to those service instances from the deployment:
Prerequisites
To use Azure machine learning, you must have an Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning today.
You must install and configure the Azure CLI and
ml
extension or the AzureML Python SDK v2. For more information, see the following articles:You must have an Azure Resource Group, in which you (or the service principal you use) need to have
Contributor
access. You'll have such a resource group if you configured yourml
extension per the above article.You must have an Azure Machine Learning workspace, and the workspace must use a private endpoint. If you don't have one, the steps in this article create an example workspace, VNet, and VM. For more information, see Configure a private endpoint for Azure Machine Learning workspace.
The workspace configuration can either allow or disallow public network access. If you plan on using managed online endpoint deployments that use public outbound, then you must also configure the workspace to allow public access.
Outbound communication from managed online endpoint deployment is to the workspace API. When the endpoint is configured to use public outbound, then the workspace must be able to accept that public communication (allow public access).
When the workspace is configured with a private endpoint, the Azure Container Registry for the workspace must be configured for Premium tier. For more information, see Azure Container Registry service tiers.
The Azure Container Registry and Azure Storage Account must be in the same Azure Resource Group as the workspace.
If you want to use a user-assigned managed identity to create and manage online endpoints and online deployments, the identity should have the proper permissions. For details about the required permissions, see Set up service authentication. For example, you need to assign the proper RBAC permission for Azure Key Vault on the identity.
Important
The end-to-end example in this article comes from the files in the azureml-examples GitHub repository. To clone the samples repository and switch to the repository's cli/
directory, use the following commands:
git clone https://github.com/Azure/azureml-examples
cd azureml-examples/cli
Limitations
The
v1_legacy_mode
flag must be disabled (false) on your Azure Machine Learning workspace. If this flag is enabled, you won't be able to create a managed online endpoint. For more information, see Network isolation with v2 API.If your Azure Machine Learning workspace has a private endpoint that was created before May 24, 2022, you must recreate the workspace's private endpoint before configuring your online endpoints to use a private endpoint. For more information on creating a private endpoint for your workspace, see How to configure a private endpoint for Azure Machine Learning workspace.
Secure outbound communication creates three private endpoints per deployment. One to the Azure Blob storage, one to the Azure Container Registry, and one to your workspace.
When you use network isolation with a deployment, Azure Log Analytics is partially supported. All metrics and the
AMLOnlineEndpointTrafficLog
table are supported via Azure Log Analytics.AMLOnlineEndpointConsoleLog
andAMLOnlineEndpointEventLog
tables are currently not supported. As a workaround, you can use the az ml online-deployment get_logs CLI command, the OnlineDeploymentOperations.get_logs() Python SDK, or the Deployment log tab in the Azure Machine Learning studio instead. For more information, see Monitoring online endpoints.You can configure public access to a managed online endpoint (inbound and outbound). You can also configure public access to an Azure Machine Learning workspace.
Outbound communication from a managed online endpoint deployment is to the workspace API. When the endpoint is configured to use public outbound, then the workspace must be able to accept that public communication (allow public access).
Note
Requests to create, update, or retrieve the authentication keys are sent to the Azure Resource Manager over the public network.
Inbound (scoring)
To secure scoring requests to the online endpoint to your virtual network, set the public_network_access
flag for the endpoint to disabled
:
az ml online-endpoint create -f endpoint.yml --set public_network_access=disabled
When public_network_access
is Disabled
, inbound scoring requests are received using the private endpoint of the Azure Machine Learning workspace, and the endpoint can't be reached from public networks.
Note
You can update (enable or disable) the public_network_access
flag of an online endpoint after creating it.
Outbound (resource access)
To restrict communication between a deployment and external resources, including the Azure resources it uses, set the deployment's egress_public_network_access
flag to disabled
. Use this flag to ensure that the download of the model, code, and images needed by your deployment are secured with a private endpoint. Note that disabling the flag alone is not enough — your workspace must also have a private link that allows access to Azure resources via a private endpoint. See the Prerequisites for more details.
Warning
You cannot update (enable or disable) the egress_public_network_access
flag after creating the deployment. Attempting to change the flag while updating the deployment will fail with an error.
Note
For online deployments with egress_public_network_access
flag set to disabled
, access from the deployments to Microsoft Container Registry (MCR) is restricted. If you want to leverage container images from MCR (such as when using curated environment or mlflow no-code deployment), recommendation is to push the images into the Azure Container Registry (ACR) which is attached with the workspace. The images in this ACR is accessible to secured deployments via the private endpoints which are automatically created on behalf of you when you set egress_public_network_access
flag to disabled
. For a quick example, please refer to this custom container example.
az ml online-deployment create -f deployment.yml --set egress_public_network_access=disabled
The deployment communicates with these resources over the private endpoint:
- The Azure Machine Learning workspace
- The Azure Storage blob that is the default storage for the workspace
- The Azure Container Registry for the workspace
When you configure the egress_public_network_access
to disabled
, a new private endpoint is created per deployment, per service. For example, if you set the flag to disabled
for three deployments to an online endpoint, nine private endpoints are created. Each deployment would have three private endpoints to communicate with the workspace, blob, and container registry.
Scenarios
The following table lists the supported configurations when configuring inbound and outbound communications for an online endpoint:
Configuration | Inbound (Endpoint property) | Outbound (Deployment property) | Supported? |
---|---|---|---|
secure inbound with secure outbound | public_network_access is disabled |
egress_public_network_access is disabled |
Yes |
secure inbound with public outbound | public_network_access is disabledThe workspace must also allow public access. |
egress_public_network_access is enabled |
Yes |
public inbound with secure outbound | public_network_access is enabled |
egress_public_network_access is disabled |
Yes |
public inbound with public outbound | public_network_access is enabledThe workspace must also allow public access. |
egress_public_network_access is enabled |
Yes |
Important
Outbound communication from managed online endpoint deployment is to the workspace API. When the endpoint is configured to use public outbound, then the workspace must be able to accept that public communication (allow public access).
End-to-end example
Use the information in this section to create an example configuration that uses private endpoints to secure online endpoints.
Tip
In this example, and Azure Virtual Machine is created inside the VNet. You connect to the VM using SSH, and run the deployment from the VM. This configuration is used to simplify the steps in this example, and does not represent a typical secure configuration. For example, in a production environment you would most likely use a VPN client or Azure ExpressRoute to directly connect clients to the virtual network.
Create workspace and secured resources
The steps in this section use an Azure Resource Manager template to create the following Azure resources:
- Azure Virtual Network
- Azure Machine Learning workspace
- Azure Container Registry
- Azure Key Vault
- Azure Storage account (blob & file storage)
Public access is disabled for all the services. While the Azure Machine Learning workspace is secured behind a vnet, it's configured to allow public network access. For more information, see CLI 2.0 secure communications. A scoring subnet is created, along with outbound rules that allow communication with the following Azure services:
- Azure Active Directory
- Azure Resource Manager
- Azure Front Door
- Microsoft Container Registries
The following diagram shows the different components created in this architecture:
The following diagram shows the overall architecture of this example:
To create the resources, use the following Azure CLI commands. To create a resource group. Replace <my-resource-group>
and <my-location>
with the desierd values.
# create resource group
az group create --name <my-resource-group> --location <my-location>
Clone the example files for the deployment, use the following command:
#Clone the example files
git clone https://github.com/Azure/azureml-examples
To create the resources, use the following Azure CLI commands. Replace <UNIQUE_SUFFIX>
with a unique suffix for the resources that are created.
az deployment group create --template-file endpoints/online/managed/vnet/setup_ws/main.bicep --parameters suffix=$SUFFIX --resource-group <my-resource-group>
Create the virtual machine jump box
To create an Azure Virtual Machine that can be used to connect to the VNet, use the following command. Replace <your-new-password>
with the password you want to use when connecting to this VM:
# create vm
az vm create --name test-vm --vnet-name vnet-$SUFFIX --subnet snet-scoring --image UbuntuLTS --admin-username azureuser --admin-password <your-new-password> --resource-group <my-resource-group>
Important
The VM created by these commands has a public endpoint that you can connect to over the public network.
The response from this command is similar to the following JSON document:
{
"fqdns": "",
"id": "/subscriptions/<GUID>/resourceGroups/<my-resource-group>/providers/Microsoft.Compute/virtualMachines/test-vm",
"location": "westus",
"macAddress": "00-0D-3A-ED-D8-E8",
"powerState": "VM running",
"privateIpAddress": "192.168.0.12",
"publicIpAddress": "20.114.122.77",
"resourceGroup": "<my-resource-group>",
"zones": ""
}
Use the following command to connect to the VM using SSH. Replace publicIpAddress
with the value of the public IP address in the response from the previous command:
ssh azureusere@publicIpAddress
When prompted, enter the password you used when creating the VM.
Configure the VM
Use the following commands from the SSH session to install the CLI and Docker:
# setup docker sudo apt-get update -y && sudo apt install docker.io -y && sudo snap install docker && docker --version && sudo usermod -aG docker $USER # setup az cli and ml extension curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash && az extension add --upgrade -n ml -y
To create the environment variables used by this example, run the following commands. Replace
<YOUR_SUBSCRIPTION_ID>
with your Azure subscription ID. Replace<YOUR_RESOURCE_GROUP>
with the resource group that contains your workspace. Replace<SUFFIX_USED_IN_SETUP>
with the suffix you provided earlier. Replace<LOCATION>
with the location of your Azure workspace. Replace<YOUR_ENDPOINT_NAME>
with the name to use for the endpoint.Tip
Use the tabs to select whether you want to perform a deployment using an MLflow model or generic ML model.
export SUBSCRIPTION="<YOUR_SUBSCRIPTION_ID>" export RESOURCE_GROUP="<YOUR_RESOURCE_GROUP>" export LOCATION="<LOCATION>" # SUFFIX that was used when creating the workspace resources. Alternatively the resource names can be looked up from the resource group after the vnet setup script has completed. export SUFFIX="<SUFFIX_USED_IN_SETUP>" # SUFFIX used during the initial setup. Alternatively the resource names can be looked up from the resource group after the setup script has completed. export WORKSPACE=mlw-$SUFFIX export ACR_NAME=cr$SUFFIX # provide a unique name for the endpoint export ENDPOINT_NAME="<YOUR_ENDPOINT_NAME>" # name of the image that will be built for this sample and pushed into acr - no need to change this export IMAGE_NAME="img" # Yaml files that will be used to create endpoint and deployment. These are relative to azureml-examples/cli/ directory. Do not change these export ENDPOINT_FILE_PATH="endpoints/online/managed/vnet/sample/endpoint.yml" export DEPLOYMENT_FILE_PATH="endpoints/online/managed/vnet/sample/blue-deployment-vnet.yml" export SAMPLE_REQUEST_PATH="endpoints/online/managed/vnet/sample/sample-request.json" export ENV_DIR_PATH="endpoints/online/managed/vnet/sample/environment"
To sign in to the Azure CLI in the VM environment, use the following command:
az login
To configure the defaults for the CLI, use the following commands:
# configure cli defaults az account set --subscription $SUBSCRIPTION az configure --defaults group=$RESOURCE_GROUP workspace=$WORKSPACE location=$LOCATION
To clone the example files for the deployment, use the following command:
sudo mkdir -p /home/samples; sudo git clone -b main --depth 1 https://github.com/Azure/azureml-examples.git /home/samples/azureml-examples
To build a custom docker image to use with the deployment, use the following commands:
# Navigate to the samples cd /home/samples/azureml-examples/cli/$ENV_DIR_PATH # login to acr. Optionally, to avoid using sudo, complete the docker post install steps: https://docs.docker.com/engine/install/linux-postinstall/ sudo az acr login -n "$ACR_NAME" # Build the docker image with the sample docker file sudo docker build -t "$ACR_NAME.azurecr.io/repo/$IMAGE_NAME":v1 . # push the image to the ACR sudo docker push "$ACR_NAME.azurecr.io/repo/$IMAGE_NAME":v1 # check if the image exists in acr az acr repository show -n "$ACR_NAME" --repository "repo/$IMAGE_NAME"
Tip
In this example, we build the Docker image before pushing it to Azure Container Registry. Alternatively, you can build the image in your vnet by using an Azure Machine Learning compute cluster and environments. For more information, see Secure Azure Machine Learning workspace.
Create a secured managed online endpoint
To create a managed online endpoint that is secured using a private endpoint for inbound and outbound communication, use the following commands:
Tip
You can test or debug the Docker image locally by using the
--local
flag when creating the deployment. For more information, see the Deploy and debug locally article.# navigate to the cli directory in the azurem-examples repo cd /home/samples/azureml-examples/cli/ # create endpoint az ml online-endpoint create --name $ENDPOINT_NAME -f $ENDPOINT_FILE_PATH --set public_network_access="disabled" # create deployment in managed vnet az ml online-deployment create --name blue --endpoint $ENDPOINT_NAME -f $DEPLOYMENT_FILE_PATH --all-traffic --set environment.image="$ACR_NAME.azurecr.io/repo/$IMAGE_NAME:v1" egress_public_network_access="disabled"
To make a scoring request with the endpoint, use the following commands:
# Try scoring using the CLI az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file $SAMPLE_REQUEST_PATH # Try scoring using curl ENDPOINT_KEY=$(az ml online-endpoint get-credentials -n $ENDPOINT_NAME -o tsv --query primaryKey) SCORING_URI=$(az ml online-endpoint show -n $ENDPOINT_NAME -o tsv --query scoring_uri) curl --request POST "$SCORING_URI" --header "Authorization: Bearer $ENDPOINT_KEY" --header 'Content-Type: application/json' --data @$SAMPLE_REQUEST_PATH
Cleanup
To delete the endpoint, use the following command:
az ml online-endpoint delete --name $ENDPOINT_NAME --yes --no-wait
To delete the VM, use the following command:
az vm delete -n $VM_NAME -y --no-wait
To delete all the resources created in this article, use the following command. Replace <resource-group-name>
with the name of the resource group used in this example:
az group delete --resource-group <resource-group-name>
Troubleshooting
Online endpoint creation fails with a V1LegacyMode == true message
The Azure Machine Learning workspace can be configured for v1_legacy_mode
, which disables v2 APIs. Managed online endpoints are a feature of the v2 API platform, and won't work if v1_legacy_mode
is enabled for the workspace.
Important
Check with your network security team before disabling v1_legacy_mode
. It may have been enabled by your network security team for a reason.
For information on how to disable v1_legacy_mode
, see Network isolation with v2.
Online endpoint creation with key-based authentication fails
Use the following command to list the network rules of the Azure Key Vault for your workspace. Replace <keyvault-name>
with the name of your key vault:
az keyvault network-rule list -n <keyvault-name>
The response for this command is similar to the following JSON document:
{
"bypass": "AzureServices",
"defaultAction": "Deny",
"ipRules": [],
"virtualNetworkRules": []
}
If the value of bypass
isn't AzureServices
, use the guidance in the Configure key vault network settings to set it to AzureServices
.
Online deployments fail with an image download error
Check if the
egress-public-network-access
flag is disabled for the deployment. If this flag is enabled, and the visibility of the container registry is private, then this failure is expected.Use the following command to check the status of the private endpoint connection. Replace
<registry-name>
with the name of the Azure Container Registry for your workspace:az acr private-endpoint-connection list -r <registry-name> --query "[?privateLinkServiceConnectionState.description=='Egress for Microsoft.MachineLearningServices/workspaces/onlineEndpoints'].{Name:name, status:privateLinkServiceConnectionState.status}"
In the response document, verify that the
status
field is set toApproved
. If it isn't approved, use the following command to approve it. Replace<private-endpoint-name>
with the name returned from the previous command:az network private-endpoint-connection approve -n <private-endpoint-name>
Scoring endpoint can't be resolved
Verify that the client issuing the scoring request is a virtual network that can access the Azure Machine Learning workspace.
Use the
nslookup
command on the endpoint hostname to retrieve the IP address information:nslookup endpointname.westcentralus.inference.ml.azure.com
The response contains an address. This address should be in the range provided by the virtual network
Note
For Kubernetes online endpoint, the endpoint hostname should be the CName (domain name) which has been specified in your Kubernetes cluster. If it is an HTTP endpoint, the IP address will be contained in the endpoint URI which you can get directly in the Studio UI. More ways to get the IP address of the endpoint can be found in Secure Kubernetes online endpoint.
If the host name isn't resolved by the
nslookup
command:For Managed online endpoint,
Check if an A record exists in the private DNS zone for the virtual network.
To check the records, use the following command:
az network private-dns record-set list -z privatelink.api.azureml.ms -o tsv --query [].name
The results should contain an entry that is similar to
*.<GUID>.inference.<region>
.If no inference value is returned, delete the private endpoint for the workspace and then recreate it. For more information, see How to configure a private endpoint.
If the workspace with a private endpoint is setup using a custom DNS How to use your workspace with a custom DNS server, use following command to verify if resolution works correctly from custom DNS.
dig endpointname.westcentralus.inference.ml.azure.com
For Kubernetes online endpoint,
Check the DNS configuration in Kubernetes cluster.
Additionally, you can check if the azureml-fe works as expected, use the following command:
kubectl exec -it deploy/azureml-fe -- /bin/bash (Run in azureml-fe pod) curl -vi -k https://localhost:<port>/api/v1/endpoint/<endpoint-name>/swagger.json "Swagger not found"
For HTTP, use
curl https://localhost:<port>/api/v1/endpoint/<endpoint-name>/swagger.json "Swagger not found"
If curl HTTPs fails (e.g. timeout) but HTTP works, please check that certificate is valid.
If this fails to resolve to A record, verify if the resolution works from Azure DNS(168.63.129.16).
dig @168.63.129.16 endpointname.westcentralus.inference.ml.azure.com
If this succeeds then you can troubleshoot conditional forwarder for private link on custom DNS.
Online deployments can't be scored
Use the following command to see if the deployment was successfully deployed:
az ml online-deployment show -e <endpointname> -n <deploymentname> --query '{name:name,state:provisioning_state}'
If the deployment completed successfully, the value of
state
will beSucceeded
.If the deployment was successful, use the following command to check that traffic is assigned to the deployment. Replace
<endpointname>
with the name of your endpoint:az ml online-endpoint show -n <endpointname> --query traffic
Tip
This step isn't needed if you are using the
azureml-model-deployment
header in your request to target this deployment.The response from this command should list percentage of traffic assigned to deployments.
If the traffic assignments (or deployment header) are set correctly, use the following command to get the logs for the endpoint. Replace
<endpointname>
with the name of the endpoint, and<deploymentname>
with the deployment:az ml online-deployment get-logs -e <endpointname> -n <deploymentname>
Look through the logs to see if there's a problem running the scoring code when you submit a request to the deployment.
Next steps
Feedback
Submit and view feedback for