Edit

Deploy models that use deployment templates

This article shows you how to deploy a registry model that pins a deployment template to a managed online endpoint. When the model has a default_deployment_template, your deployment YAML only needs to reference the model — the deployment template supplies the environment, environment variables, scoring port, and probes. You can also override the default with another deployment template. The model's allowed_deployment_templates list is the author's curated set of validated overrides to choose from — it's guidance, not an enforced restriction.

Note

The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2 extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.

Prerequisites

Step 1: Find a model that uses a deployment template

Inspect the model in the registry to confirm that it has a default_deployment_template, and to see which other deployment templates are in allowed_deployment_templates.

A model's defaultDeploymentTemplate and allowedDeploymentTemplates are returned by the registry data plane, which you reach through the registry's resource provider host. The Azure Resource Manager (management.azure.com) model GET doesn't include these fields. First look up the data plane host with a one-time discovery call, then get the model version from it.

TOKEN=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv)

RP_HOST=$(curl -s \
  "https://<your-region>.api.azureml.ms/registrymanagement/v1.0/registries/<your-registry>/discovery?api-version=v1.0" \
  -H "Authorization: Bearer $TOKEN" | jq -r '.primaryRegionResourceProviderUri')

curl -X GET \
  "${RP_HOST%/}/mferp/managementfrontend/subscriptions/<your-subscription-id>/resourceGroups/<your-resource-group>/providers/Microsoft.MachineLearningServices/registries/<your-registry>/models/my-model/versions/1?api-version=2021-10-01-dataplanepreview" \
  -H "Authorization: Bearer $TOKEN"

Look for properties.defaultDeploymentTemplate.assetId and properties.allowedDeploymentTemplates[].assetId in the response.

Step 2: Create an online endpoint

Use the Online Endpoints - Create Or Update operation:

TOKEN=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv)

curl -X PUT \
  "https://management.azure.com/subscriptions/<your-subscription-id>/resourceGroups/<your-resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<your-workspace>/onlineEndpoints/my-endpoint?api-version=2023-04-01-preview" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "location": "<your-region>",
    "identity": { "type": "SystemAssigned" },
    "properties": {
      "authMode": "Key"
    }
  }'

Step 3: Deploy with the model's default deployment template

When the model has a default_deployment_template, the deployment template supplies infrastructure settings such as environment, request settings, and probes. The deployment payload still must include the SKU/instance settings required by the underlying ARM resource.

Use the Online Deployments - Create Or Update operation. Save the following JSON as deployment-default.json. The top-level sku block (name and capacity) is required for the ARM resource, and properties.endpointComputeType must be Managed. The properties.model reference points to the registry model whose default deployment template supplies the environment, request settings, and probes.

{
  "name": "blue",
  "endpointName": "my-endpoint",
  "tags": {},
  "location": "<your-region>",
  "properties": {
    "environmentVariables": {},
    "properties": {},
    "appInsightsEnabled": false,
    "endpointComputeType": "Managed",
    "instanceType": "Standard_DS3_v2",
    "model": "azureml://registries/<your-registry>/models/my-model/versions/1"
  },
  "sku": {
    "name": "Default",
    "capacity": 1
  }
}
TOKEN=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv)

curl -X PUT \
  "https://management.azure.com/subscriptions/<your-subscription-id>/resourceGroups/<your-resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<your-workspace>/onlineEndpoints/my-endpoint/deployments/blue?api-version=2023-04-01-preview" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d @deployment-default.json

Step 4: (Optional) Deploy with an override deployment template

To use a different deployment template than the model's default, specify the override on the deployment. The model's allowed_deployment_templates list is the author's curated set of validated override templates to choose from; it's guidance, not an enforced restriction, so the platform doesn't block an override that isn't in the list. The override deployment can be a separate deployment under the same endpoint, so the deployment from Step 3 keeps serving traffic until you update the endpoint's traffic allocation.

In the REST API, set the override inside the nested properties.properties bag using the azureml.deploymentTemplateOverride key. Save the following JSON as deployment-override.json:

{
  "name": "green",
  "endpointName": "my-endpoint",
  "tags": {},
  "location": "<your-region>",
  "properties": {
    "environmentVariables": {},
    "properties": {
      "azureml.deploymentTemplateOverride": "azureml://registries/<your-registry>/deploymenttemplates/my-deployment-template2/versions/1"
    },
    "appInsightsEnabled": false,
    "endpointComputeType": "Managed",
    "instanceType": "Standard_DS3_v2",
    "model": "azureml://registries/<your-registry>/models/my-model/versions/1"
  },
  "sku": {
    "name": "Default",
    "capacity": 1
  }
}
TOKEN=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv)

curl -X PUT \
  "https://management.azure.com/subscriptions/<your-subscription-id>/resourceGroups/<your-resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<your-workspace>/onlineEndpoints/my-endpoint/deployments/green?api-version=2023-04-01-preview" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d @deployment-override.json

Step 5: Invoke the endpoint

Send a scoring request to the deployment. The examples target the blue deployment from Step 3. To invoke the override deployment from Step 4 instead, replace blue with green.

For the TF Serving deployment template from Manage models with deployment templates, the request payload is the standard TF Serving REST format. Save the following as request.json:

{ "instances": [1.0, 2.0, 5.0] }

Get the scoring URI and key, then POST the request payload to the scoring URI. For the TF Serving deployment template, the scoring path is /v1/models/half_plus_two:predict. For details, see Invoke the endpoint to score data by using your model.

SCORING_URI=$(az ml online-endpoint show --name my-endpoint --query scoring_uri -o tsv \
  --workspace-name <your-workspace> --resource-group <your-resource-group>)
KEY=$(az ml online-endpoint get-credentials --name my-endpoint --query primaryKey -o tsv \
  --workspace-name <your-workspace> --resource-group <your-resource-group>)

curl -X POST "$SCORING_URI" \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d @request.json

The response is the TF Serving predict response, for example: { "predictions": [2.5, 3.0, 4.5] }.

Step 6: Update or delete the deployment

To change traffic allocation, scale the deployment, or delete it, use the standard managed online endpoint commands. The deployment template doesn't change these operations. To shift live traffic to the override deployment from Step 4, replace blue=100 with green=100 in the traffic update.

az ml online-endpoint update --name my-endpoint \
  --traffic "blue=100" \
  --workspace-name <your-workspace> \
  --resource-group <your-resource-group>

az ml online-deployment delete --name blue --endpoint-name my-endpoint \
  --workspace-name <your-workspace> \
  --resource-group <your-resource-group>

Troubleshooting

  • The instance type isn't allowed for this deployment template. The instance_type you set on the deployment isn't in the deployment template's allowed_instance_types list. Use az ml deployment-template show to list the allowed instance types, or omit instance_type to use the deployment template's default_instance_type.
  • The environment isn't a registry-scoped reference. Deployment templates must reference an environment with the azureml://registries/<registry-name>/environments/<name>/versions/<version> syntax. Share workspace environments to a registry before you reference them.