Secure managed online endpoints by using network isolation
In this article
APPLIES TO:
Azure CLI ml extension v2 (current)
Python SDK azure-ai-ml v2 (current)
This article shows you how to use network isolation to improve the security of an Azure Machine Learning managed online endpoint. Network isolation helps secure the inbound and outbound communication to and from your endpoint.
To help secure inbound communication, you can create a managed online endpoint that uses the private endpoint of an Azure Machine Learning workspace. To allow only approved outbound communication for deployments, you can configure the workspace with a managed virtual network. This article shows you how to take these steps to improve endpoint security. It also shows you how to create a deployment that uses the private endpoints of the workspace's managed virtual network for outbound communication.
If you prefer to use the legacy method for network isolation, see the following deployment file examples in the azureml-examples GitHub repository:
- For a deployment that uses a generic model: deploy-moe-vnet-legacy.sh
- For a deployment that uses an MLflow model: deploy-moe-vnet-mlflow-legacy.sh
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin.
The Azure CLI and the Azure CLI
ml
extension, installed and configured. For more information, see Install and set up the CLI (v2).Tip
The Azure Machine Learning managed virtual network feature was introduced on May 23, 2023. If you have an older version of the
ml
extension, you might need to update it for the examples in this article to work. To update the extension, use the following Azure CLI command:az extension update -n ml
A Bash shell or a compatible shell, for example, a shell on a Linux system or Windows Subsystem for Linux. The Azure CLI examples in this article assume that you use this type of shell.
An Azure resource group in which you or the service principal that you use have Contributor access. For instructions for creating a resource group, see Set up.
A user-assigned managed identity with appropriate permissions, if you want to use a managed identity to create and manage online endpoints and online deployments. For detailed information about required permissions, see Set up authentication between Azure Machine Learning and other services. For example, you need to grant your managed identity specific Azure role-based access control (Azure RBAC) permissions for Azure Key Vault.
If you use the legacy method for network isolation of managed online endpoints and you want to migrate to a managed virtual network to secure your endpoints, follow these steps:
- Create a new workspace and enable a managed virtual network. For more information about how to configure a managed network for your workspace, see Workspace managed virtual network isolation.
- (Optional) If your deployments access private resources other than Azure Storage, Key Vault, and Azure Container Registry, add outbound rules to the network settings of your workspace. Specifically, the network is configured with rules for Azure Storage, Key Vault, and Container Registry by default. Add rules with private endpoints for any other private resources that you use.
- (Optional) If you intend to use an Azure Machine Learning registry, configure private endpoints for outbound communication to your registry, its storage account, and its instance of Container Registry.
- Create online endpoints and deployments in the new workspace. If you use Azure Machine Learning registries, you can directly deploy components from them. For more information, see Deploy model from registry to online endpoint in workspace.
- Update applications that invoke endpoints so that the applications use the scoring URIs of the new online endpoints.
- After you validate your new endpoints, delete the online endpoints in your old workspace.
If you don't need to avoid downtime during migration, you can take a more straightforward approach. If you don't need to maintain compute instances, online endpoints, and deployments in your old workspace, you can delete the compute instances and then update the workspace to enable a managed virtual network.
The
v1_legacy_mode
flag must be set tofalse
to turn off v1 legacy mode on your Azure Machine Learning workspace. If this setting is turned on, you can't create a managed online endpoint. For more information, see Network isolation change with our new API platform on Azure Resource Manager.If your Azure Machine Learning workspace has a private endpoint that was created before May 24, 2022, you must re-create that private endpoint before you configure your online endpoints to use private endpoints. For more information about creating a private endpoint for your workspace, see Configure a private endpoint for an Azure Machine Learning workspace.
Tip
To see the creation date of a workspace, you can check the workspace properties.
- In Azure Machine Learning studio, go to the upper-right corner and select the name of your workspace.
- In the Directory + Subscription + Workspace window, select View all properties in Azure Portal.
- In the Azure portal Overview page, go to the upper-right corner and select JSON View.
- In the Resource JSON window, under API Versions, select the latest API version.
- In the
properties
section of the JSON code, check thecreationTime
value.
Alternatively, use one of the following methods:
- Python SDK:
Workspace.get(name=<workspace-name>, subscription_id=<subscription-ID>, resource_group=<resource-group-name>).get_details()
- REST API:
curl https://management.azure.com/subscriptions/<subscription-ID>/resourceGroups/<resource-group-name>/providers/Microsoft.MachineLearningServices/workspaces/?api-version=2023-10-01 -H "Authorization:Bearer <access-token>"
- PowerShell:
Get-AzMLWorkspace -Name <workspace-name> -ResourceGroupName <resource-group-name>
When you use network isolation to help secure online endpoints, you can use workspace-associated resources from a different resource group than your workspace resource group. However, these resources must belong to the same subscription and tenant as your workspace. Resources that are associated with a workspace include Azure Container Registry, Azure Storage, Azure Key Vault, and Application Insights.
Note
This article describes network isolation that applies to data plane operations. These operations result from scoring requests, or model serving. Control plane operations, such as requests to create, update, delete, or retrieve authentication keys, are sent to Azure Resource Manager over the public network.
Create environment variables by running the following commands. Replace
<resource-group-name>
with the resource group for your workspace. Replace<workspace-name>
with the name of your workspace.export RESOURCEGROUP_NAME="<resource-group-name>" export WORKSPACE_NAME="<workspace-name>"
Create your workspace. The
-m allow_only_approved_outbound
parameter configures a managed virtual network for the workspace and blocks outbound traffic except to approved destinations.az ml workspace create -g $RESOURCEGROUP_NAME -n $WORKSPACE_NAME -m allow_only_approved_outbound
Alternatively, if you'd like to allow the deployment to send outbound traffic to the internet, uncomment the following code and run it instead.
# az ml workspace create -g $RESOURCEGROUP_NAME -n $WORKSPACE_NAME -m allow_internet_outbound
For more information about how to create a new workspace or upgrade your existing workspace to use a managed virtual network, see Configure a managed virtual network to allow internet outbound.
Provision the managed virtual network. For instructions and more information, see Manually provision a managed VNet.
Important
When you set up a managed virtual network for a workspace for the first time, the network isn't provisioned. You can't create online deployments until you provision the managed network.
Configure the container registry that's associated with the workspace to use a premium pricing plan. This setting is needed to provide access to the registry via a private endpoint. For more information, see Azure Container Registry service tiers.
Configure your workspace to use a compute cluster or compute instance to build images. You can use the
image_build_compute
property for this purpose. For more information and instructions, see Configure image builds.Configure default values for the Azure CLI so that you can avoid passing in the values for your workspace and resource group multiple times.
az configure --defaults workspace=$WORKSPACE_NAME group=$RESOURCEGROUP_NAME
Clone the examples repository to get the example files for the endpoint and deployment, and then go to the repository's cli directory.
git clone --depth 1 https://github.com/Azure/azureml-examples cd azureml-examples/cli
The commands in this article are in the deploy-managed-online-endpoint-workspacevnet.sh file in the cli directory. The YAML configuration files are in the endpoints/online/managed/sample/ subdirectory.
To create a secured managed online endpoint, you create the endpoint in your workspace. Then you set the endpoint's public_network_access
value to disabled
to control inbound communication.
This setting forces the online endpoint to use the workspace's private endpoint for inbound communication. The only way to invoke the online endpoint is by using a private endpoint that can access the workspace in your virtual network. For more information, see Secure inbound scoring requests and Configure a private endpoint for an Azure Machine Learning workspace.
Because the workspace is configured to have a managed virtual network, any endpoint deployments use the private endpoints of the managed virtual network for outbound communication.
Set the endpoint's name:
export ENDPOINT_NAME="<YOUR_ENDPOINT_NAME>"
Create an endpoint with
public_network_access
set todisabled
to block inbound traffic:az ml online-endpoint create --name $ENDPOINT_NAME -f endpoints/online/managed/sample/endpoint.yml --set public_network_access=disabled
Alternatively, if you want to allow the endpoint to receive scoring requests from the internet, uncomment the following code and run it instead:
# az ml online-endpoint create --name $ENDPOINT_NAME -f endpoints/online/managed/sample/endpoint.yml
Create a deployment in the managed virtual network of the workspace:
az ml online-deployment create --name blue --endpoint $ENDPOINT_NAME -f endpoints/online/managed/sample/blue-deployment.yml --all-traffic
If you get an error about an authorization failure, check the networking configuration for the workspace storage account. You might have to adjust the public network access settings to give the workspace access to the storage account.
Get the status of the deployment:
az ml online-endpoint show -n $ENDPOINT_NAME
Test the endpoint by issuing a scoring request:
az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file endpoints/online/model-1/sample-request.json
Get the deployment logs:
az ml online-deployment get-logs --name blue --endpoint $ENDPOINT_NAME
If you no longer need the endpoint, run the following command to delete it.
az ml online-endpoint delete --name $ENDPOINT_NAME --yes --no-wait
If you no longer need the workspace, its associated resources, and the other resources in your resource group, delete them. Replace
<resource-group-name>
with the name of the resource group that contains your workspace.az group delete --resource-group <resource-group-name>
Managed online endpoints are a feature of the Azure Machine Learning v2 API platform. If your Azure Machine Learning workspace is configured for v1 legacy mode, the managed online endpoints don't work. Specifically, if the v1_legacy_mode
workspace setting is set to true
, v1 legacy mode is turned on, and there's no support for v2 APIs.
To see how to turn off v1 legacy mode, see Network isolation change with our new API platform on Azure Resource Manager.
Important
Check with your network security team before you set v1_legacy_mode
to false
, because v1 legacy mode might be turned on for a reason.
Use the following command to list the network rules of the Azure key vault for your workspace. Replace <key-vault-name>
with the name of your key vault.
az keyvault network-rule list -n <key-vault-name>
The response for this command is similar to the following JSON code:
{
"bypass": "AzureServices",
"defaultAction": "Deny",
"ipRules": [],
"virtualNetworkRules": []
}
If the value of bypass
isn't AzureServices
, use the guidance in Configure Azure Key Vault networking settings to set it to AzureServices
.
Note
This issue applies when you use the legacy network isolation method for managed online endpoints. In this method, Azure Machine Learning creates a managed virtual network for each deployment under an endpoint.
Check whether the
egress-public-network-access
flag has a value ofdisabled
for the deployment. If this flag is enabled, and the visibility of the container registry is private, this failure is expected.Use the following command to check the status of the private endpoint connection. Replace
<registry-name>
with the name of the Azure container registry for your workspace:az acr private-endpoint-connection list -r <registry-name> --query "[?privateLinkServiceConnectionState.description=='Egress for Microsoft.MachineLearningServices/workspaces/onlineEndpoints'].{ID:id, status:privateLinkServiceConnectionState.status}"
In the response code, verify that the
status
field is set toApproved
. If the value isn'tApproved
, use the following command to approve the connection. Replace<private-endpoint-connection-ID>
with the ID that the preceding command returns.az network private-endpoint-connection approve --id <private-endpoint-connection-ID> --description "Approved"
Verify that the client issuing the scoring request is a virtual network that can access the Azure Machine Learning workspace.
Use the
nslookup
command on the endpoint host name to retrieve the IP address information:nslookup <endpoint-name>.<endpoint-region>.inference.ml.azure.com
For example, your command might look similar to the following one:
nslookup endpointname.westcentralus.inference.ml.azure.com
The response contains an address that should be in the range provided by the virtual network.
Note
- For Kubernetes online endpoint, the endpoint host name should be the CName (domain name) that's specified in your Kubernetes cluster.
- If the endpoint uses HTTP, the IP address is contained in the endpoint URI, which you can get from the studio UI.
- For more ways to get the IP address of the endpoint, see Update your DNS with an FQDN.
If the
nslookup
command doesn't resolve the host name, take the actions in one of the following sections.
Use the following command to check whether an A record exists in the private Domain Name System (DNS) zone for the virtual network.
az network private-dns record-set list -z privatelink.api.azureml.ms -o tsv --query [].name
The results should contain an entry similar to
*.<GUID>.inference.<region>
.If no inference value is returned, delete the private endpoint for the workspace and then re-create it. For more information, see How to configure a private endpoint.
If the workspace with a private endpoint uses a custom DNS server, run the following command to verify that the resolution from the custom DNS server works correctly:
dig <endpoint-name>.<endpoint-region>.inference.ml.azure.com
Check the DNS configuration in the Kubernetes cluster.
Check whether the Azure Machine Learning inference router,
azureml-fe
, works as expected. To perform this check, take the following steps:Run the following command in the
azureml-fe
pod:kubectl exec -it deploy/azureml-fe -- /bin/bash
Run one of the following commands:
curl -vi -k https://localhost:<port>/api/v1/endpoint/<endpoint-name>/swagger.json "Swagger not found"
For HTTP, use the following command:
curl https://localhost:<port>/api/v1/endpoint/<endpoint-name>/swagger.json "Swagger not found"
If the curl HTTPS command fails or times out but the HTTP command works, check whether the certificate is valid.
If the preceding process fails to resolve to the A record, use the following command to verify whether the resolution works from the Azure DNS virtual public IP address, 168.63.129.16:
dig @168.63.129.16 <endpoint-name>.<endpoint-region>.inference.ml.azure.com
If the preceding command succeeds, troubleshoot the conditional forwarder for Azure Private Link on a custom DNS.
Run the following command to see the status of a deployment that can't be scored:
az ml online-deployment show -e <endpoint-name> -n <deployment-name> --query '{name:name,state:provisioning_state}'
A value of
Succeeded
for thestate
field indicates a successful deployment.For a successful deployment, use the following command to check that traffic is assigned to the deployment:
az ml online-endpoint show -n <endpoint-name> --query traffic
The response from this command should list the percentage of traffic that's assigned to each deployment.
Tip
This step isn't necessary if you use the
azureml-model-deployment
header in your request to target this deployment.If the traffic assignments or deployment header are set correctly, use the following command to get the logs for the endpoint:
az ml online-deployment get-logs -e <endpoint-name> -n <deployment-name>
Review the logs to see whether there's a problem running the scoring code when you submit a request to the deployment.