CLI (v2) online endpoint YAML schema
APPLIES TO: Azure CLI ml extension v2 (current)
The source JSON schema can be found at https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json for managed online endpoint, and at https://azuremlschemas.azureedge.net/latest/kubernetesOnlineEndpoint.schema.json for Kubernetes online endpoint. The differences between managed online endpoint and Kubernetes online endpoint are described in the table of properties in this article. Sample in this article focuses on managed online endpoint.
Note
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2 extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
Note
A fully specified sample YAML for managed online endpoints is available for reference
YAML syntax
Key | Type | Description | Allowed values | Default value |
---|---|---|---|---|
$schema |
string | The YAML schema. If you use the Azure Machine Learning VS Code extension to author the YAML file, including $schema at the top of your file enables you to invoke schema and resource completions. |
||
name |
string | Required. Name of the endpoint. Needs to be unique at the Azure region level. Naming rules are defined under endpoint limits. |
||
description |
string | Description of the endpoint. | ||
tags |
object | Dictionary of tags for the endpoint. | ||
auth_mode |
string | The authentication method for invoking the endpoint (data plane operation). Use key for key-based authentication. Use aml_token for Azure Machine Learning token-based authentication. Use aad_token for Microsoft Entra token-based authentication. |
key , aml_token , aad_token |
key |
compute |
string | Name of the compute target to run the endpoint deployments on. This field is only applicable for endpoint deployments to Azure Arc-enabled Kubernetes clusters (the compute target specified in this field must have type: kubernetes ). Don't specify this field if you're doing managed online inference. |
||
identity |
object | The managed identity configuration for accessing Azure resources for endpoint provisioning and inference. | ||
identity.type |
string | The type of managed identity. If the type is user_assigned , the identity.user_assigned_identities property must also be specified. |
system_assigned , user_assigned |
|
identity.user_assigned_identities |
array | List of fully qualified resource IDs of the user-assigned identities. | ||
traffic |
object | Traffic represents the percentage of requests to be served by different deployments. It's represented by a dictionary of key-value pairs, where keys represent the deployment name and value represent the percentage of traffic to that deployment. For example, blue: 90 green: 10 means 90% requests are sent to the deployment named blue and 10% is sent to deployment green . Total traffic has to either be 0 or sum up to 100. See Safe rollout for online endpoints to see the traffic configuration in action. Note: you can't set this field during online endpoint creation, as the deployments under that endpoint must be created before traffic can be set. You can update the traffic for an online endpoint after the deployments have been created using az ml online-endpoint update ; for example, az ml online-endpoint update --name <endpoint_name> --traffic "blue=90 green=10" . |
||
public_network_access |
string | This flag controls the visibility of the managed endpoint. When disabled , inbound scoring requests are received using the private endpoint of the Azure Machine Learning workspace and the endpoint can't be reached from public networks. This flag is applicable only for managed endpoints |
enabled , disabled |
enabled |
mirror_traffic |
string | Percentage of live traffic to mirror to a deployment. Mirroring traffic doesn't change the results returned to clients. The mirrored percentage of traffic is copied and submitted to the specified deployment so you can gather metrics and logging without impacting clients. For example, to check if latency is within acceptable bounds and that there are no HTTP errors. It's represented by a dictionary with a single key-value pair, where the key represents the deployment name and the value represents the percentage of traffic to mirror to the deployment. For more information, see Test a deployment with mirrored traffic. |
Remarks
The az ml online-endpoint
commands can be used for managing Azure Machine Learning online endpoints.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
YAML: basic
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-endpoint
auth_mode: key
YAML: system-assigned identity
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-sai-endpoint
auth_mode: key
YAML: user-assigned identity
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-uai-endpoint
auth_mode: key
identity:
type: user_assigned
user_assigned_identities:
- resource_id: user_identity_ARM_id_place_holder