CLI (v2) managed online deployment YAML schema

APPLIES TO: Azure CLI ml extension v2 (current)

The source JSON schema can be found at https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json.

Note

The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2 extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.

YAML syntax

Key Type Description Allowed values Default value
$schema string The YAML schema. If you use the Azure Machine Learning VS Code extension to author the YAML file, including $schema at the top of your file enables you to invoke schema and resource completions.
name string Required. Name of the deployment.

Naming rules are defined here.
description string Description of the deployment.
tags object Dictionary of tags for the deployment.
endpoint_name string Required. Name of the endpoint to create the deployment under.
model string or object The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification.

To reference an existing model, use the azureml:<model-name>:<model-version> syntax.

To define a model inline, follow the Model schema.

As a best practice for production scenarios, you should create the model separately and reference it here.

This field is optional for custom container deployment scenarios.
model_mount_path string The path to mount the model in a custom container. Applicable only for custom container deployment scenarios. If the model field is specified, it's mounted on this path in the container.
code_configuration object Configuration for the scoring code logic.

This field is optional for custom container deployment scenarios.
code_configuration.code string Local path to the source code directory for scoring the model.
code_configuration.scoring_script string Relative path to the scoring file in the source code directory.
environment_variables object Dictionary of environment variable key-value pairs to set in the deployment container. You can access these environment variables from your scoring scripts.
environment string or object Required. The environment to use for the deployment. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification.

To reference an existing environment, use the azureml:<environment-name>:<environment-version> syntax.

To define an environment inline, follow the Environment schema.

As a best practice for production scenarios, you should create the environment separately and reference it here.
instance_type string Required. The VM size to use for the deployment. For the list of supported sizes, see Managed online endpoints SKU list.
instance_count integer Required. The number of instances to use for the deployment. Specify the value based on the workload you expect. For high availability, Microsoft recommends you set it to at least 3.

instance_count can be updated after deployment creation using az ml online-deployment update command.

We reserve an extra 20% for performing upgrades. For more information, see managed online endpoint quotas.
app_insights_enabled boolean Whether to enable integration with the Azure Application Insights instance associated with your workspace. false
scale_settings object The scale settings for the deployment. Currently only the default scale type is supported, so you don't need to specify this property.

With this default scale type, you can either manually scale the instance count up and down after deployment creation by updating the instance_count property, or create an autoscaling policy.
scale_settings.type string The scale type. default default
request_settings object Scoring request settings for the deployment. See RequestSettings for the set of configurable properties.
liveness_probe object Liveness probe settings for monitoring the health of the container regularly. See ProbeSettings for the set of configurable properties.
readiness_probe object Readiness probe settings for validating if the container is ready to serve traffic. See ProbeSettings for the set of configurable properties.
egress_public_network_access string This flag secures the deployment by restricting communication between the deployment and the Azure resources used by it. Set to disabled to ensure that the download of the model, code, and images needed by your deployment are secured with a private endpoint. This flag is applicable only for managed online endpoints. enabled, disabled enabled

RequestSettings

Key Type Description Default value
request_timeout_ms integer The scoring timeout in milliseconds. 5000
max_concurrent_requests_per_instance integer The maximum number of concurrent requests per instance allowed for the deployment.

Set to the number of requests that your model can process concurrently on a single node. Setting this value higher than your model's actual concurrency can lead to higher latencies. Setting this value too low may lead to under utilized nodes. Setting too low may also result in requests being rejected with a 429 HTTP status code, as the system will opt to fail fast.

For more information, see Troubleshooting online endpoints: HTTP status codes.
1
max_queue_wait_ms integer The maximum amount of time in milliseconds a request will stay in the queue. 500

ProbeSettings

Key Type Description Default value
initial_delay integer The number of seconds after the container has started before the probe is initiated. Minimum value is 1. 10
period integer How often (in seconds) to perform the probe. 10
timeout integer The number of seconds after which the probe times out. Minimum value is 1. 2
success_threshold integer The minimum consecutive successes for the probe to be considered successful after having failed. Minimum value is 1. 1
failure_threshold integer When a probe fails, the system will try failure_threshold times before giving up. Giving up in the case of a liveness probe means the container will be restarted. In the case of a readiness probe the container will be marked Unready. Minimum value is 1. 30

Remarks

The az ml online-deployment commands can be used for managing Azure Machine Learning managed online deployments.

Examples

Examples are available in the examples GitHub repository. Several are shown below.

YAML: basic

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
endpoint_name: my-endpoint
model:
  path: ../../model-1/model/
code_configuration:
  code: ../../model-1/onlinescoring/
  scoring_script: score.py
environment: 
  conda_file: ../../model-1/environment/conda.yml
  image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
instance_type: Standard_DS2_v2
instance_count: 1
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: green
endpoint_name: my-endpoint
model:
  path: ../../model-2/model/
code_configuration:
  code: ../../model-2/onlinescoring/
  scoring_script: score.py
environment:
  conda_file: ../../model-2/environment/conda.yml
  image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
instance_type: Standard_DS2_v2
instance_count: 1

YAML: system-assigned identity

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
model:
  path: ../../model-1/model/
code_configuration:
  code: ../../model-1/onlinescoring/
  scoring_script: score_managedidentity.py
environment:
  conda_file: ../../model-1/environment/conda-managedidentity.yml
  image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
instance_type: Standard_DS2_v2
instance_count: 1
environment_variables:
  STORAGE_ACCOUNT_NAME: "storage_place_holder"
  STORAGE_CONTAINER_NAME: "container_place_holder"
  FILE_NAME: "file_place_holder"

YAML: user-assigned identity

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
model:
  path: ../../model-1/model/
code_configuration:
  code: ../../model-1/onlinescoring/
  scoring_script: score_managedidentity.py
environment: 
  conda_file: ../../model-1/environment/conda-managedidentity.yml
  image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
instance_type: Standard_DS2_v2
instance_count: 1
environment_variables:
  STORAGE_ACCOUNT_NAME: "storage_place_holder"
  STORAGE_CONTAINER_NAME: "container_place_holder"
  FILE_NAME: "file_place_holder"
  UAI_CLIENT_ID: "uai_client_id_place_holder"

Next steps