Microsoft.MachineLearningServices workspaces/onlineEndpoints/deployments 2023-10-01
Bicep resource definition
The workspaces/onlineEndpoints/deployments resource type can be deployed with operations that target:
- Resource groups - See resource group deployment commands
For a list of changed properties in each API version, see change log.
Resource format
To create a Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource, add the following Bicep to your template.
resource symbolicname 'Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments@2023-10-01' = {
name: 'string'
location: 'string'
tags: {
tagName1: 'tagValue1'
tagName2: 'tagValue2'
}
sku: {
capacity: int
family: 'string'
name: 'string'
size: 'string'
tier: 'string'
}
kind: 'string'
parent: resourceSymbolicName
identity: {
type: 'string'
userAssignedIdentities: {
{customized property}: {}
}
}
properties: {
appInsightsEnabled: bool
codeConfiguration: {
codeId: 'string'
scoringScript: 'string'
}
description: 'string'
egressPublicNetworkAccess: 'string'
environmentId: 'string'
environmentVariables: {
{customized property}: 'string'
}
instanceType: 'string'
livenessProbe: {
failureThreshold: int
initialDelay: 'string'
period: 'string'
successThreshold: int
timeout: 'string'
}
model: 'string'
modelMountPath: 'string'
properties: {
{customized property}: 'string'
}
readinessProbe: {
failureThreshold: int
initialDelay: 'string'
period: 'string'
successThreshold: int
timeout: 'string'
}
requestSettings: {
maxConcurrentRequestsPerInstance: int
maxQueueWait: 'string'
requestTimeout: 'string'
}
scaleSettings: {
scaleType: 'string'
// For remaining properties, see OnlineScaleSettings objects
}
endpointComputeType: 'string'
// For remaining properties, see OnlineDeploymentProperties objects
}
}
OnlineDeploymentProperties objects
Set the endpointComputeType property to specify the type of object.
For Kubernetes, use:
endpointComputeType: 'Kubernetes'
containerResourceRequirements: {
containerResourceLimits: {
cpu: 'string'
gpu: 'string'
memory: 'string'
}
containerResourceRequests: {
cpu: 'string'
gpu: 'string'
memory: 'string'
}
}
For Managed, use:
endpointComputeType: 'Managed'
OnlineScaleSettings objects
Set the scaleType property to specify the type of object.
For Default, use:
scaleType: 'Default'
For TargetUtilization, use:
scaleType: 'TargetUtilization'
maxInstances: int
minInstances: int
pollingInterval: 'string'
targetUtilizationPercentage: int
Property values
workspaces/onlineEndpoints/deployments
Name | Description | Value |
---|---|---|
name | The resource name See how to set names and types for child resources in Bicep. |
string (required) |
location | The geo-location where the resource lives | string (required) |
tags | Resource tags. | Dictionary of tag names and values. See Tags in templates |
sku | Sku details required for ARM contract for Autoscaling. | Sku |
kind | Metadata used by portal/tooling/etc to render different UX experiences for resources of the same type. | string |
parent | In Bicep, you can specify the parent resource for a child resource. You only need to add this property when the child resource is declared outside of the parent resource. For more information, see Child resource outside parent resource. |
Symbolic name for resource of type: onlineEndpoints |
identity | Managed service identity (system assigned and/or user assigned identities) | ManagedServiceIdentity |
properties | [Required] Additional attributes of the entity. | OnlineDeploymentProperties (required) |
ManagedServiceIdentity
Name | Description | Value |
---|---|---|
type | Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed). | 'None' 'SystemAssigned' 'SystemAssigned,UserAssigned' 'UserAssigned' (required) |
userAssignedIdentities | The set of user assigned identities associated with the resource. The userAssignedIdentities dictionary keys will be ARM resource ids in the form: '/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName}. The dictionary values can be empty objects ({}) in requests. | UserAssignedIdentities |
UserAssignedIdentities
Name | Description | Value |
---|---|---|
{customized property} | UserAssignedIdentity |
UserAssignedIdentity
This object doesn't contain any properties to set during deployment. All properties are ReadOnly.
OnlineDeploymentProperties
Name | Description | Value |
---|---|---|
appInsightsEnabled | If true, enables Application Insights logging. | bool |
codeConfiguration | Code configuration for the endpoint deployment. | CodeConfiguration |
description | Description of the endpoint deployment. | string |
egressPublicNetworkAccess | If Enabled, allow egress public network access. If Disabled, this will create secure egress. Default: Enabled. | 'Disabled' 'Enabled' |
environmentId | ARM resource ID or AssetId of the environment specification for the endpoint deployment. | string |
environmentVariables | Environment variables configuration for the deployment. | EndpointDeploymentPropertiesBaseEnvironmentVariables |
instanceType | Compute instance type. | string |
livenessProbe | Liveness probe monitors the health of the container regularly. | ProbeSettings |
model | The URI path to the model. | string |
modelMountPath | The path to mount the model in custom container. | string |
properties | Property dictionary. Properties can be added, but not removed or altered. | EndpointDeploymentPropertiesBaseProperties |
readinessProbe | Readiness probe validates if the container is ready to serve traffic. The properties and defaults are the same as liveness probe. | ProbeSettings |
requestSettings | Request settings for the deployment. | OnlineRequestSettings |
scaleSettings | Scale settings for the deployment. If it is null or not provided, it defaults to TargetUtilizationScaleSettings for KubernetesOnlineDeployment and to DefaultScaleSettings for ManagedOnlineDeployment. |
OnlineScaleSettings |
endpointComputeType | Set the object type | Kubernetes Managed (required) |
CodeConfiguration
Name | Description | Value |
---|---|---|
codeId | ARM resource ID of the code asset. | string |
scoringScript | [Required] The script to execute on startup. eg. "score.py" | string (required) Constraints: Min length = 1 Pattern = [a-zA-Z0-9_] |
EndpointDeploymentPropertiesBaseEnvironmentVariables
Name | Description | Value |
---|---|---|
{customized property} | string |
ProbeSettings
Name | Description | Value |
---|---|---|
failureThreshold | The number of failures to allow before returning an unhealthy status. | int |
initialDelay | The delay before the first probe in ISO 8601 format. | string |
period | The length of time between probes in ISO 8601 format. | string |
successThreshold | The number of successful probes before returning a healthy status. | int |
timeout | The probe timeout in ISO 8601 format. | string |
EndpointDeploymentPropertiesBaseProperties
Name | Description | Value |
---|---|---|
{customized property} | string |
OnlineRequestSettings
Name | Description | Value |
---|---|---|
maxConcurrentRequestsPerInstance | The number of maximum concurrent requests per node allowed per deployment. Defaults to 1. | int |
maxQueueWait | (Deprecated for Managed Online Endpoints) The maximum amount of time a request will stay in the queue in ISO 8601 format. Defaults to 500ms. (Now increase request_timeout_ms to account for any networking/queue delays) |
string |
requestTimeout | The scoring timeout in ISO 8601 format. Defaults to 5000ms. |
string |
OnlineScaleSettings
Name | Description | Value |
---|---|---|
scaleType | Set the object type | Default TargetUtilization (required) |
DefaultScaleSettings
Name | Description | Value |
---|---|---|
scaleType | [Required] Type of deployment scaling algorithm | 'Default' (required) |
TargetUtilizationScaleSettings
Name | Description | Value |
---|---|---|
scaleType | [Required] Type of deployment scaling algorithm | 'TargetUtilization' (required) |
maxInstances | The maximum number of instances that the deployment can scale to. The quota will be reserved for max_instances. | int |
minInstances | The minimum number of instances to always be present. | int |
pollingInterval | The polling interval in ISO 8691 format. Only supports duration with precision as low as Seconds. | string |
targetUtilizationPercentage | Target CPU usage for the autoscaler. | int |
KubernetesOnlineDeployment
Name | Description | Value |
---|---|---|
endpointComputeType | [Required] The compute type of the endpoint. | 'Kubernetes' (required) |
containerResourceRequirements | The resource requirements for the container (cpu and memory). | ContainerResourceRequirements |
ContainerResourceRequirements
Name | Description | Value |
---|---|---|
containerResourceLimits | Container resource limit info: | ContainerResourceSettings |
containerResourceRequests | Container resource request info: | ContainerResourceSettings |
ContainerResourceSettings
Name | Description | Value |
---|---|---|
cpu | Number of vCPUs request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
gpu | Number of Nvidia GPU cards request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
memory | Memory size request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
ManagedOnlineDeployment
Name | Description | Value |
---|---|---|
endpointComputeType | [Required] The compute type of the endpoint. | 'Managed' (required) |
Sku
Name | Description | Value |
---|---|---|
capacity | If the SKU supports scale out/in then the capacity integer should be included. If scale out/in is not possible for the resource this may be omitted. | int |
family | If the service has different generations of hardware, for the same SKU, then that can be captured here. | string |
name | The name of the SKU. Ex - P3. It is typically a letter+number code | string (required) |
size | The SKU size. When the name field is the combination of tier and some other value, this would be the standalone code. | string |
tier | This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT. | 'Basic' 'Free' 'Premium' 'Standard' |
ARM template resource definition
The workspaces/onlineEndpoints/deployments resource type can be deployed with operations that target:
- Resource groups - See resource group deployment commands
For a list of changed properties in each API version, see change log.
Resource format
To create a Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource, add the following JSON to your template.
{
"type": "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments",
"apiVersion": "2023-10-01",
"name": "string",
"location": "string",
"tags": {
"tagName1": "tagValue1",
"tagName2": "tagValue2"
},
"sku": {
"capacity": "int",
"family": "string",
"name": "string",
"size": "string",
"tier": "string"
},
"kind": "string",
"identity": {
"type": "string",
"userAssignedIdentities": {
"{customized property}": {}
}
},
"properties": {
"appInsightsEnabled": "bool",
"codeConfiguration": {
"codeId": "string",
"scoringScript": "string"
},
"description": "string",
"egressPublicNetworkAccess": "string",
"environmentId": "string",
"environmentVariables": {
"{customized property}": "string"
},
"instanceType": "string",
"livenessProbe": {
"failureThreshold": "int",
"initialDelay": "string",
"period": "string",
"successThreshold": "int",
"timeout": "string"
},
"model": "string",
"modelMountPath": "string",
"properties": {
"{customized property}": "string"
},
"readinessProbe": {
"failureThreshold": "int",
"initialDelay": "string",
"period": "string",
"successThreshold": "int",
"timeout": "string"
},
"requestSettings": {
"maxConcurrentRequestsPerInstance": "int",
"maxQueueWait": "string",
"requestTimeout": "string"
},
"scaleSettings": {
"scaleType": "string"
// For remaining properties, see OnlineScaleSettings objects
},
"endpointComputeType": "string"
// For remaining properties, see OnlineDeploymentProperties objects
}
}
OnlineDeploymentProperties objects
Set the endpointComputeType property to specify the type of object.
For Kubernetes, use:
"endpointComputeType": "Kubernetes",
"containerResourceRequirements": {
"containerResourceLimits": {
"cpu": "string",
"gpu": "string",
"memory": "string"
},
"containerResourceRequests": {
"cpu": "string",
"gpu": "string",
"memory": "string"
}
}
For Managed, use:
"endpointComputeType": "Managed"
OnlineScaleSettings objects
Set the scaleType property to specify the type of object.
For Default, use:
"scaleType": "Default"
For TargetUtilization, use:
"scaleType": "TargetUtilization",
"maxInstances": "int",
"minInstances": "int",
"pollingInterval": "string",
"targetUtilizationPercentage": "int"
Property values
workspaces/onlineEndpoints/deployments
Name | Description | Value |
---|---|---|
type | The resource type | 'Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments' |
apiVersion | The resource api version | '2023-10-01' |
name | The resource name See how to set names and types for child resources in JSON ARM templates. |
string (required) |
location | The geo-location where the resource lives | string (required) |
tags | Resource tags. | Dictionary of tag names and values. See Tags in templates |
sku | Sku details required for ARM contract for Autoscaling. | Sku |
kind | Metadata used by portal/tooling/etc to render different UX experiences for resources of the same type. | string |
identity | Managed service identity (system assigned and/or user assigned identities) | ManagedServiceIdentity |
properties | [Required] Additional attributes of the entity. | OnlineDeploymentProperties (required) |
ManagedServiceIdentity
Name | Description | Value |
---|---|---|
type | Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed). | 'None' 'SystemAssigned' 'SystemAssigned,UserAssigned' 'UserAssigned' (required) |
userAssignedIdentities | The set of user assigned identities associated with the resource. The userAssignedIdentities dictionary keys will be ARM resource ids in the form: '/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName}. The dictionary values can be empty objects ({}) in requests. | UserAssignedIdentities |
UserAssignedIdentities
Name | Description | Value |
---|---|---|
{customized property} | UserAssignedIdentity |
UserAssignedIdentity
This object doesn't contain any properties to set during deployment. All properties are ReadOnly.
OnlineDeploymentProperties
Name | Description | Value |
---|---|---|
appInsightsEnabled | If true, enables Application Insights logging. | bool |
codeConfiguration | Code configuration for the endpoint deployment. | CodeConfiguration |
description | Description of the endpoint deployment. | string |
egressPublicNetworkAccess | If Enabled, allow egress public network access. If Disabled, this will create secure egress. Default: Enabled. | 'Disabled' 'Enabled' |
environmentId | ARM resource ID or AssetId of the environment specification for the endpoint deployment. | string |
environmentVariables | Environment variables configuration for the deployment. | EndpointDeploymentPropertiesBaseEnvironmentVariables |
instanceType | Compute instance type. | string |
livenessProbe | Liveness probe monitors the health of the container regularly. | ProbeSettings |
model | The URI path to the model. | string |
modelMountPath | The path to mount the model in custom container. | string |
properties | Property dictionary. Properties can be added, but not removed or altered. | EndpointDeploymentPropertiesBaseProperties |
readinessProbe | Readiness probe validates if the container is ready to serve traffic. The properties and defaults are the same as liveness probe. | ProbeSettings |
requestSettings | Request settings for the deployment. | OnlineRequestSettings |
scaleSettings | Scale settings for the deployment. If it is null or not provided, it defaults to TargetUtilizationScaleSettings for KubernetesOnlineDeployment and to DefaultScaleSettings for ManagedOnlineDeployment. |
OnlineScaleSettings |
endpointComputeType | Set the object type | Kubernetes Managed (required) |
CodeConfiguration
Name | Description | Value |
---|---|---|
codeId | ARM resource ID of the code asset. | string |
scoringScript | [Required] The script to execute on startup. eg. "score.py" | string (required) Constraints: Min length = 1 Pattern = [a-zA-Z0-9_] |
EndpointDeploymentPropertiesBaseEnvironmentVariables
Name | Description | Value |
---|---|---|
{customized property} | string |
ProbeSettings
Name | Description | Value |
---|---|---|
failureThreshold | The number of failures to allow before returning an unhealthy status. | int |
initialDelay | The delay before the first probe in ISO 8601 format. | string |
period | The length of time between probes in ISO 8601 format. | string |
successThreshold | The number of successful probes before returning a healthy status. | int |
timeout | The probe timeout in ISO 8601 format. | string |
EndpointDeploymentPropertiesBaseProperties
Name | Description | Value |
---|---|---|
{customized property} | string |
OnlineRequestSettings
Name | Description | Value |
---|---|---|
maxConcurrentRequestsPerInstance | The number of maximum concurrent requests per node allowed per deployment. Defaults to 1. | int |
maxQueueWait | (Deprecated for Managed Online Endpoints) The maximum amount of time a request will stay in the queue in ISO 8601 format. Defaults to 500ms. (Now increase request_timeout_ms to account for any networking/queue delays) |
string |
requestTimeout | The scoring timeout in ISO 8601 format. Defaults to 5000ms. |
string |
OnlineScaleSettings
Name | Description | Value |
---|---|---|
scaleType | Set the object type | Default TargetUtilization (required) |
DefaultScaleSettings
Name | Description | Value |
---|---|---|
scaleType | [Required] Type of deployment scaling algorithm | 'Default' (required) |
TargetUtilizationScaleSettings
Name | Description | Value |
---|---|---|
scaleType | [Required] Type of deployment scaling algorithm | 'TargetUtilization' (required) |
maxInstances | The maximum number of instances that the deployment can scale to. The quota will be reserved for max_instances. | int |
minInstances | The minimum number of instances to always be present. | int |
pollingInterval | The polling interval in ISO 8691 format. Only supports duration with precision as low as Seconds. | string |
targetUtilizationPercentage | Target CPU usage for the autoscaler. | int |
KubernetesOnlineDeployment
Name | Description | Value |
---|---|---|
endpointComputeType | [Required] The compute type of the endpoint. | 'Kubernetes' (required) |
containerResourceRequirements | The resource requirements for the container (cpu and memory). | ContainerResourceRequirements |
ContainerResourceRequirements
Name | Description | Value |
---|---|---|
containerResourceLimits | Container resource limit info: | ContainerResourceSettings |
containerResourceRequests | Container resource request info: | ContainerResourceSettings |
ContainerResourceSettings
Name | Description | Value |
---|---|---|
cpu | Number of vCPUs request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
gpu | Number of Nvidia GPU cards request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
memory | Memory size request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
ManagedOnlineDeployment
Name | Description | Value |
---|---|---|
endpointComputeType | [Required] The compute type of the endpoint. | 'Managed' (required) |
Sku
Name | Description | Value |
---|---|---|
capacity | If the SKU supports scale out/in then the capacity integer should be included. If scale out/in is not possible for the resource this may be omitted. | int |
family | If the service has different generations of hardware, for the same SKU, then that can be captured here. | string |
name | The name of the SKU. Ex - P3. It is typically a letter+number code | string (required) |
size | The SKU size. When the name field is the combination of tier and some other value, this would be the standalone code. | string |
tier | This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT. | 'Basic' 'Free' 'Premium' 'Standard' |
Terraform (AzAPI provider) resource definition
The workspaces/onlineEndpoints/deployments resource type can be deployed with operations that target:
- Resource groups
For a list of changed properties in each API version, see change log.
Resource format
To create a Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource, add the following Terraform to your template.
resource "azapi_resource" "symbolicname" {
type = "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments@2023-10-01"
name = "string"
location = "string"
parent_id = "string"
tags = {
tagName1 = "tagValue1"
tagName2 = "tagValue2"
}
identity {
type = "string"
identity_ids = []
}
body = jsonencode({
properties = {
appInsightsEnabled = bool
codeConfiguration = {
codeId = "string"
scoringScript = "string"
}
description = "string"
egressPublicNetworkAccess = "string"
environmentId = "string"
environmentVariables = {
{customized property} = "string"
}
instanceType = "string"
livenessProbe = {
failureThreshold = int
initialDelay = "string"
period = "string"
successThreshold = int
timeout = "string"
}
model = "string"
modelMountPath = "string"
properties = {
{customized property} = "string"
}
readinessProbe = {
failureThreshold = int
initialDelay = "string"
period = "string"
successThreshold = int
timeout = "string"
}
requestSettings = {
maxConcurrentRequestsPerInstance = int
maxQueueWait = "string"
requestTimeout = "string"
}
scaleSettings = {
scaleType = "string"
// For remaining properties, see OnlineScaleSettings objects
}
endpointComputeType = "string"
// For remaining properties, see OnlineDeploymentProperties objects
}
sku = {
capacity = int
family = "string"
name = "string"
size = "string"
tier = "string"
}
kind = "string"
})
}
OnlineDeploymentProperties objects
Set the endpointComputeType property to specify the type of object.
For Kubernetes, use:
endpointComputeType = "Kubernetes"
containerResourceRequirements = {
containerResourceLimits = {
cpu = "string"
gpu = "string"
memory = "string"
}
containerResourceRequests = {
cpu = "string"
gpu = "string"
memory = "string"
}
}
For Managed, use:
endpointComputeType = "Managed"
OnlineScaleSettings objects
Set the scaleType property to specify the type of object.
For Default, use:
scaleType = "Default"
For TargetUtilization, use:
scaleType = "TargetUtilization"
maxInstances = int
minInstances = int
pollingInterval = "string"
targetUtilizationPercentage = int
Property values
workspaces/onlineEndpoints/deployments
Name | Description | Value |
---|---|---|
type | The resource type | "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments@2023-10-01" |
name | The resource name | string (required) |
location | The geo-location where the resource lives | string (required) |
parent_id | The ID of the resource that is the parent for this resource. | ID for resource of type: onlineEndpoints |
tags | Resource tags. | Dictionary of tag names and values. |
sku | Sku details required for ARM contract for Autoscaling. | Sku |
kind | Metadata used by portal/tooling/etc to render different UX experiences for resources of the same type. | string |
identity | Managed service identity (system assigned and/or user assigned identities) | ManagedServiceIdentity |
properties | [Required] Additional attributes of the entity. | OnlineDeploymentProperties (required) |
ManagedServiceIdentity
Name | Description | Value |
---|---|---|
type | Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed). | "SystemAssigned" "SystemAssigned,UserAssigned" "UserAssigned" (required) |
identity_ids | The set of user assigned identities associated with the resource. The userAssignedIdentities dictionary keys will be ARM resource ids in the form: '/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName}. The dictionary values can be empty objects ({}) in requests. | Array of user identity IDs. |
UserAssignedIdentities
Name | Description | Value |
---|---|---|
{customized property} | UserAssignedIdentity |
UserAssignedIdentity
This object doesn't contain any properties to set during deployment. All properties are ReadOnly.
OnlineDeploymentProperties
Name | Description | Value |
---|---|---|
appInsightsEnabled | If true, enables Application Insights logging. | bool |
codeConfiguration | Code configuration for the endpoint deployment. | CodeConfiguration |
description | Description of the endpoint deployment. | string |
egressPublicNetworkAccess | If Enabled, allow egress public network access. If Disabled, this will create secure egress. Default: Enabled. | "Disabled" "Enabled" |
environmentId | ARM resource ID or AssetId of the environment specification for the endpoint deployment. | string |
environmentVariables | Environment variables configuration for the deployment. | EndpointDeploymentPropertiesBaseEnvironmentVariables |
instanceType | Compute instance type. | string |
livenessProbe | Liveness probe monitors the health of the container regularly. | ProbeSettings |
model | The URI path to the model. | string |
modelMountPath | The path to mount the model in custom container. | string |
properties | Property dictionary. Properties can be added, but not removed or altered. | EndpointDeploymentPropertiesBaseProperties |
readinessProbe | Readiness probe validates if the container is ready to serve traffic. The properties and defaults are the same as liveness probe. | ProbeSettings |
requestSettings | Request settings for the deployment. | OnlineRequestSettings |
scaleSettings | Scale settings for the deployment. If it is null or not provided, it defaults to TargetUtilizationScaleSettings for KubernetesOnlineDeployment and to DefaultScaleSettings for ManagedOnlineDeployment. |
OnlineScaleSettings |
endpointComputeType | Set the object type | Kubernetes Managed (required) |
CodeConfiguration
Name | Description | Value |
---|---|---|
codeId | ARM resource ID of the code asset. | string |
scoringScript | [Required] The script to execute on startup. eg. "score.py" | string (required) Constraints: Min length = 1 Pattern = [a-zA-Z0-9_] |
EndpointDeploymentPropertiesBaseEnvironmentVariables
Name | Description | Value |
---|---|---|
{customized property} | string |
ProbeSettings
Name | Description | Value |
---|---|---|
failureThreshold | The number of failures to allow before returning an unhealthy status. | int |
initialDelay | The delay before the first probe in ISO 8601 format. | string |
period | The length of time between probes in ISO 8601 format. | string |
successThreshold | The number of successful probes before returning a healthy status. | int |
timeout | The probe timeout in ISO 8601 format. | string |
EndpointDeploymentPropertiesBaseProperties
Name | Description | Value |
---|---|---|
{customized property} | string |
OnlineRequestSettings
Name | Description | Value |
---|---|---|
maxConcurrentRequestsPerInstance | The number of maximum concurrent requests per node allowed per deployment. Defaults to 1. | int |
maxQueueWait | (Deprecated for Managed Online Endpoints) The maximum amount of time a request will stay in the queue in ISO 8601 format. Defaults to 500ms. (Now increase request_timeout_ms to account for any networking/queue delays) |
string |
requestTimeout | The scoring timeout in ISO 8601 format. Defaults to 5000ms. |
string |
OnlineScaleSettings
Name | Description | Value |
---|---|---|
scaleType | Set the object type | Default TargetUtilization (required) |
DefaultScaleSettings
Name | Description | Value |
---|---|---|
scaleType | [Required] Type of deployment scaling algorithm | "Default" (required) |
TargetUtilizationScaleSettings
Name | Description | Value |
---|---|---|
scaleType | [Required] Type of deployment scaling algorithm | "TargetUtilization" (required) |
maxInstances | The maximum number of instances that the deployment can scale to. The quota will be reserved for max_instances. | int |
minInstances | The minimum number of instances to always be present. | int |
pollingInterval | The polling interval in ISO 8691 format. Only supports duration with precision as low as Seconds. | string |
targetUtilizationPercentage | Target CPU usage for the autoscaler. | int |
KubernetesOnlineDeployment
Name | Description | Value |
---|---|---|
endpointComputeType | [Required] The compute type of the endpoint. | "Kubernetes" (required) |
containerResourceRequirements | The resource requirements for the container (cpu and memory). | ContainerResourceRequirements |
ContainerResourceRequirements
Name | Description | Value |
---|---|---|
containerResourceLimits | Container resource limit info: | ContainerResourceSettings |
containerResourceRequests | Container resource request info: | ContainerResourceSettings |
ContainerResourceSettings
Name | Description | Value |
---|---|---|
cpu | Number of vCPUs request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
gpu | Number of Nvidia GPU cards request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
memory | Memory size request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
ManagedOnlineDeployment
Name | Description | Value |
---|---|---|
endpointComputeType | [Required] The compute type of the endpoint. | "Managed" (required) |
Sku
Name | Description | Value |
---|---|---|
capacity | If the SKU supports scale out/in then the capacity integer should be included. If scale out/in is not possible for the resource this may be omitted. | int |
family | If the service has different generations of hardware, for the same SKU, then that can be captured here. | string |
name | The name of the SKU. Ex - P3. It is typically a letter+number code | string (required) |
size | The SKU size. When the name field is the combination of tier and some other value, this would be the standalone code. | string |
tier | This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT. | "Basic" "Free" "Premium" "Standard" |