Online Deployments - List
List Inference Endpoint Deployments.
GET https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.MachineLearningServices/workspaces/{workspaceName}/onlineEndpoints/{endpointName}/deployments?api-version=2025-12-01
GET https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.MachineLearningServices/workspaces/{workspaceName}/onlineEndpoints/{endpointName}/deployments?api-version=2025-12-01&$orderBy={$orderBy}&$top={$top}&$skip={$skip}
URI Parameters
| Name | In | Required | Type | Description |
|---|---|---|---|---|
|
endpoint
|
path | True |
string |
Online Endpoint name. |
|
resource
|
path | True |
string minLength: 1maxLength: 90 |
The name of the resource group. The name is case insensitive. |
|
subscription
|
path | True |
string minLength: 1 |
The ID of the target subscription. |
|
workspace
|
path | True |
string pattern: ^[a-zA-Z0-9][a-zA-Z0-9_-]{2,32}$ |
Azure Machine Learning Workspace Name |
|
api-version
|
query | True |
string minLength: 1 |
The API version to use for this operation. |
|
$order
|
query |
string |
Ordering of list. |
|
|
$skip
|
query |
string |
Continuation token for pagination. |
|
|
$top
|
query |
integer (int32) |
Top of list. |
Responses
| Name | Type | Description |
|---|---|---|
| 200 OK |
Azure operation completed successfully. |
|
| Other Status Codes |
An unexpected error response. |
Security
azure_auth
Azure Active Directory OAuth2 Flow.
Type:
oauth2
Flow:
implicit
Authorization URL:
https://login.microsoftonline.com/common/oauth2/authorize
Scopes
| Name | Description |
|---|---|
| user_impersonation | impersonate your user account |
Examples
List Online Deployments.
Sample request
GET https://management.azure.com/subscriptions/00000000-1111-2222-3333-444444444444/resourceGroups/test-rg/providers/Microsoft.MachineLearningServices/workspaces/my-aml-workspace/onlineEndpoints/testEndpointName/deployments?api-version=2025-12-01&$orderBy=string&$top=1
Sample response
{
"nextLink": "https://management.azure.com/subscriptions/34adfa4f-cedf-4dc0-ba29-b6d1a69ab345/resourceGroups/testrg123/providers/Microsoft.MachineLearningServices/workspaces/my-aml-workspace/onlineEndpoints/testEndpointName/deployments?api-version=2025-07-01-preview&$skip=2",
"value": [
{
"name": "string",
"type": "string",
"id": "string",
"identity": {
"type": "SystemAssigned",
"principalId": "00000000-1111-2222-3333-444444444444",
"tenantId": "00000000-1111-2222-3333-444444444444",
"userAssignedIdentities": {
"string": {
"clientId": "00000000-1111-2222-3333-444444444444",
"principalId": "00000000-1111-2222-3333-444444444444"
}
}
},
"kind": "string",
"location": "string",
"properties": {
"description": "string",
"appInsightsEnabled": false,
"codeConfiguration": {
"codeId": "string",
"scoringScript": "string"
},
"containerResourceRequirements": {
"containerResourceLimits": {
"cpu": "\"1\"",
"gpu": "\"1\"",
"memory": "\"2Gi\""
},
"containerResourceRequests": {
"cpu": "\"1\"",
"gpu": "\"1\"",
"memory": "\"2Gi\""
}
},
"endpointComputeType": "Kubernetes",
"environmentId": "string",
"environmentVariables": {
"string": "string"
},
"instanceType": "string",
"livenessProbe": {
"failureThreshold": 1,
"initialDelay": "PT5M",
"period": "PT5M",
"successThreshold": 1,
"timeout": "PT5M"
},
"model": "string",
"modelMountPath": "string",
"properties": {
"string": "string"
},
"provisioningState": "Creating",
"requestSettings": {
"maxConcurrentRequestsPerInstance": 1,
"maxQueueWait": "PT5M",
"requestTimeout": "PT5M"
},
"scaleSettings": {
"scaleType": "Default"
}
},
"sku": {
"name": "string",
"capacity": 1,
"family": "string",
"size": "string",
"tier": "Free"
},
"systemData": {
"createdAt": "2020-01-01T12:34:56.999Z",
"createdBy": "string",
"createdByType": "User",
"lastModifiedAt": "2020-01-01T12:34:56.999Z",
"lastModifiedBy": "string",
"lastModifiedByType": "User"
},
"tags": {}
}
]
}
Definitions
| Name | Description |
|---|---|
|
Code |
Configuration for a scoring code asset. |
| Collection | |
|
Container |
Resource requirements for each container instance within an online deployment. |
|
Container |
|
|
created |
The type of identity that created the resource. |
|
Data |
Enable or disable data collection. |
|
Data |
|
|
Default |
|
|
Deployment |
Possible values for DeploymentProvisioningState. |
|
Egress |
Enum to determine whether PublicNetworkAccess is Enabled or Disabled for egress of a deployment. |
|
Endpoint |
Enum to determine endpoint compute type. |
|
Error |
The resource management error additional info. |
|
Error |
The error detail. |
|
Error |
Error response |
|
Kubernetes |
Properties specific to a KubernetesOnlineDeployment. |
|
Managed |
Properties specific to a ManagedOnlineDeployment. |
|
Managed |
Managed service identity (system assigned and/or user assigned identities) |
|
Managed |
Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed). |
|
Online |
Concrete tracked resource types can be created by aliasing this type using a specific property type. |
|
Online |
A paginated list of OnlineDeployment entities. |
|
Online |
Online deployment scoring requests configuration. |
|
Probe |
Deployment container liveness/readiness probe configuration. |
|
Request |
|
|
Rolling |
When model data is collected to blob storage, we need to roll the data to different path to avoid logging all of them in a single blob file. If the rolling rate is hour, all data will be collected in the blob path /yyyy/MM/dd/HH/. If it's day, all data will be collected in blob path /yyyy/MM/dd/. The other benefit of rolling path is that model monitoring ui is able to select a time range of data very quickly. |
|
Scale |
|
| Sku |
The resource model definition representing SKU |
|
Sku |
This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT. |
|
system |
Metadata pertaining to creation and last modification of the resource. |
|
Target |
|
|
User |
User assigned identity properties |
CodeConfiguration
Configuration for a scoring code asset.
| Name | Type | Description |
|---|---|---|
| codeId |
string |
ARM resource ID of the code asset. |
| scoringScript |
string minLength: 1pattern: [a-zA-Z0-9_] |
[Required] The script to execute on startup. eg. "score.py" |
Collection
| Name | Type | Default value | Description |
|---|---|---|---|
| clientId |
string |
The msi client id used to collect logging to blob storage. If it's null,backend will pick a registered endpoint identity to auth. |
|
| dataCollectionMode | Disabled |
Enable or disable data collection. |
|
| dataId |
string |
The data asset arm resource id. Client side will ensure data asset is pointing to the blob storage, and backend will collect data to the blob storage. |
|
| samplingRate |
number (double) |
1 |
The sampling rate for collection. Sampling rate 1.0 means we collect 100% of data by default. |
ContainerResourceRequirements
Resource requirements for each container instance within an online deployment.
| Name | Type | Description |
|---|---|---|
| containerResourceLimits |
Container resource limit info: |
|
| containerResourceRequests |
Container resource request info: |
ContainerResourceSettings
| Name | Type | Description |
|---|---|---|
| cpu |
string |
Number of vCPUs request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
| gpu |
string |
Number of Nvidia GPU cards request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
| memory |
string |
Memory size request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
createdByType
The type of identity that created the resource.
| Value | Description |
|---|---|
| User | |
| Application | |
| ManagedIdentity | |
| Key |
DataCollectionMode
Enable or disable data collection.
| Value | Description |
|---|---|
| Enabled | |
| Disabled |
DataCollector
| Name | Type | Default value | Description |
|---|---|---|---|
| collections |
<string, Collection> |
[Required] The collection configuration. Each collection has it own configuration to collect model data and the name of collection can be arbitrary string. Model data collector can be used for either payload logging or custom logging or both of them. Collection request and response are reserved for payload logging, others are for custom logging. |
|
| requestLogging |
The request logging configuration for mdc, it includes advanced logging settings for all collections. It's optional. |
||
| rollingRate | Hour |
When model data is collected to blob storage, we need to roll the data to different path to avoid logging all of them in a single blob file. If the rolling rate is hour, all data will be collected in the blob path /yyyy/MM/dd/HH/. If it's day, all data will be collected in blob path /yyyy/MM/dd/. The other benefit of rolling path is that model monitoring ui is able to select a time range of data very quickly. |
DefaultScaleSettings
| Name | Type | Description |
|---|---|---|
| scaleType |
string:
Default |
[Required] Type of deployment scaling algorithm |
DeploymentProvisioningState
Possible values for DeploymentProvisioningState.
| Value | Description |
|---|---|
| Creating | |
| Deleting | |
| Scaling | |
| Updating | |
| Succeeded | |
| Failed | |
| Canceled |
EgressPublicNetworkAccessType
Enum to determine whether PublicNetworkAccess is Enabled or Disabled for egress of a deployment.
| Value | Description |
|---|---|
| Enabled | |
| Disabled |
EndpointComputeType
Enum to determine endpoint compute type.
| Value | Description |
|---|---|
| Managed | |
| Kubernetes | |
| AzureMLCompute |
ErrorAdditionalInfo
The resource management error additional info.
| Name | Type | Description |
|---|---|---|
| info |
object |
The additional info. |
| type |
string |
The additional info type. |
ErrorDetail
The error detail.
| Name | Type | Description |
|---|---|---|
| additionalInfo |
The error additional info. |
|
| code |
string |
The error code. |
| details |
The error details. |
|
| message |
string |
The error message. |
| target |
string |
The error target. |
ErrorResponse
Error response
| Name | Type | Description |
|---|---|---|
| error |
The error object. |
KubernetesOnlineDeployment
Properties specific to a KubernetesOnlineDeployment.
| Name | Type | Default value | Description |
|---|---|---|---|
| appInsightsEnabled |
boolean |
False |
If true, enables Application Insights logging. |
| codeConfiguration |
Code configuration for the endpoint deployment. |
||
| containerResourceRequirements |
The resource requirements for the container (cpu and memory). |
||
| dataCollector |
The mdc configuration, we disable mdc when it's null. |
||
| description |
string |
Description of the endpoint deployment. |
|
| egressPublicNetworkAccess | Enabled |
Enum to determine whether PublicNetworkAccess is Enabled or Disabled for egress of a deployment. |
|
| endpointComputeType |
string:
Kubernetes |
[Required] The compute type of the endpoint. |
|
| environmentId |
string |
ARM resource ID or AssetId of the environment specification for the endpoint deployment. |
|
| environmentVariables |
object |
Environment variables configuration for the deployment. |
|
| instanceType |
string |
Standard_F4s_v2 |
Compute instance type. Default: Standard_F4s_v2. |
| livenessProbe |
Liveness probe monitors the health of the container regularly. |
||
| model |
string |
The URI path to the model. |
|
| modelMountPath |
string |
The path to mount the model in custom container. |
|
| properties |
object |
Property dictionary. Properties can be added, but not removed or altered. |
|
| provisioningState |
Provisioning state for the endpoint deployment. |
||
| readinessProbe |
Readiness probe validates if the container is ready to serve traffic. The properties and defaults are the same as liveness probe. |
||
| requestSettings |
Request settings for the deployment. |
||
| scaleSettings | OnlineScaleSettings: |
Scale settings for the deployment. If it is null or not provided, it defaults to TargetUtilizationScaleSettings for KubernetesOnlineDeployment and to DefaultScaleSettings for ManagedOnlineDeployment. |
|
| startupProbe |
Startup probe verify whether an application within a container has started successfully. |
ManagedOnlineDeployment
Properties specific to a ManagedOnlineDeployment.
| Name | Type | Default value | Description |
|---|---|---|---|
| appInsightsEnabled |
boolean |
False |
If true, enables Application Insights logging. |
| codeConfiguration |
Code configuration for the endpoint deployment. |
||
| dataCollector |
The mdc configuration, we disable mdc when it's null. |
||
| description |
string |
Description of the endpoint deployment. |
|
| egressPublicNetworkAccess | Enabled |
Enum to determine whether PublicNetworkAccess is Enabled or Disabled for egress of a deployment. |
|
| endpointComputeType |
string:
Managed |
[Required] The compute type of the endpoint. |
|
| environmentId |
string |
ARM resource ID or AssetId of the environment specification for the endpoint deployment. |
|
| environmentVariables |
object |
Environment variables configuration for the deployment. |
|
| instanceType |
string |
Standard_F4s_v2 |
Compute instance type. Default: Standard_F4s_v2. |
| livenessProbe |
Liveness probe monitors the health of the container regularly. |
||
| model |
string |
The URI path to the model. |
|
| modelMountPath |
string |
The path to mount the model in custom container. |
|
| properties |
object |
Property dictionary. Properties can be added, but not removed or altered. |
|
| provisioningState |
Provisioning state for the endpoint deployment. |
||
| readinessProbe |
Readiness probe validates if the container is ready to serve traffic. The properties and defaults are the same as liveness probe. |
||
| requestSettings |
Request settings for the deployment. |
||
| scaleSettings | OnlineScaleSettings: |
Scale settings for the deployment. If it is null or not provided, it defaults to TargetUtilizationScaleSettings for KubernetesOnlineDeployment and to DefaultScaleSettings for ManagedOnlineDeployment. |
|
| startupProbe |
Startup probe verify whether an application within a container has started successfully. |
ManagedServiceIdentity
Managed service identity (system assigned and/or user assigned identities)
| Name | Type | Description |
|---|---|---|
| principalId |
string (uuid) |
The service principal ID of the system assigned identity. This property will only be provided for a system assigned identity. |
| tenantId |
string (uuid) |
The tenant ID of the system assigned identity. This property will only be provided for a system assigned identity. |
| type |
Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed). |
|
| userAssignedIdentities |
<string,
User |
User-Assigned Identities |
ManagedServiceIdentityType
Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed).
| Value | Description |
|---|---|
| None | |
| SystemAssigned | |
| UserAssigned | |
| SystemAssigned,UserAssigned |
OnlineDeployment
Concrete tracked resource types can be created by aliasing this type using a specific property type.
| Name | Type | Description |
|---|---|---|
| id |
string |
Fully qualified resource ID for the resource. Ex - /subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/{resourceProviderNamespace}/{resourceType}/{resourceName} |
| identity |
Managed service identity (system assigned and/or user assigned identities) |
|
| kind |
string |
Metadata used by portal/tooling/etc to render different UX experiences for resources of the same type. |
| location |
string |
The geo-location where the resource lives |
| name |
string |
The name of the resource |
| properties | OnlineDeploymentProperties: |
[Required] Additional attributes of the entity. |
| sku |
Sku details required for ARM contract for Autoscaling. |
|
| systemData |
Azure Resource Manager metadata containing createdBy and modifiedBy information. |
|
| tags |
object |
Resource tags. |
| type |
string |
The type of the resource. E.g. "Microsoft.Compute/virtualMachines" or "Microsoft.Storage/storageAccounts" |
OnlineDeploymentTrackedResourceArmPaginatedResult
A paginated list of OnlineDeployment entities.
| Name | Type | Description |
|---|---|---|
| nextLink |
string (uri) |
The link to the next page of items |
| value |
The OnlineDeployment items on this page |
OnlineRequestSettings
Online deployment scoring requests configuration.
| Name | Type | Default value | Description |
|---|---|---|---|
| maxConcurrentRequestsPerInstance |
integer (int32) |
1 |
The number of maximum concurrent requests per node allowed per deployment. Defaults to 1. |
| maxQueueWait |
string (duration) |
PT0.5S |
(Deprecated for Managed Online Endpoints) The maximum amount of time a request will stay in the queue in ISO 8601 format.
Defaults to 500ms.
(Now increase |
| requestTimeout |
string (duration) |
PT5S |
The scoring timeout in ISO 8601 format. Defaults to 5000ms. |
ProbeSettings
Deployment container liveness/readiness probe configuration.
| Name | Type | Default value | Description |
|---|---|---|---|
| failureThreshold |
integer (int32) |
30 |
The number of failures to allow before returning an unhealthy status. |
| initialDelay |
string (duration) |
The delay before the first probe in ISO 8601 format. |
|
| period |
string (duration) |
PT10S |
The length of time between probes in ISO 8601 format. |
| successThreshold |
integer (int32) |
1 |
The number of successful probes before returning a healthy status. |
| timeout |
string (duration) |
PT2S |
The probe timeout in ISO 8601 format. |
RequestLogging
| Name | Type | Description |
|---|---|---|
| captureHeaders |
string[] |
For payload logging, we only collect payload by default. If customers also want to collect the specified headers, they can set them in captureHeaders so that backend will collect those headers along with payload. |
RollingRateType
When model data is collected to blob storage, we need to roll the data to different path to avoid logging all of them in a single blob file. If the rolling rate is hour, all data will be collected in the blob path /yyyy/MM/dd/HH/. If it's day, all data will be collected in blob path /yyyy/MM/dd/. The other benefit of rolling path is that model monitoring ui is able to select a time range of data very quickly.
| Value | Description |
|---|---|
| Year | |
| Month | |
| Day | |
| Hour | |
| Minute |
ScaleType
| Value | Description |
|---|---|
| Default | |
| TargetUtilization |
Sku
The resource model definition representing SKU
| Name | Type | Description |
|---|---|---|
| capacity |
integer (int32) |
If the SKU supports scale out/in then the capacity integer should be included. If scale out/in is not possible for the resource this may be omitted. |
| family |
string |
If the service has different generations of hardware, for the same SKU, then that can be captured here. |
| name |
string |
The name of the SKU. Ex - P3. It is typically a letter+number code |
| size |
string |
The SKU size. When the name field is the combination of tier and some other value, this would be the standalone code. |
| tier |
This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT. |
SkuTier
This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT.
| Value | Description |
|---|---|
| Free | |
| Basic | |
| Standard | |
| Premium |
systemData
Metadata pertaining to creation and last modification of the resource.
| Name | Type | Description |
|---|---|---|
| createdAt |
string (date-time) |
The timestamp of resource creation (UTC). |
| createdBy |
string |
The identity that created the resource. |
| createdByType |
The type of identity that created the resource. |
|
| lastModifiedAt |
string (date-time) |
The timestamp of resource last modification (UTC) |
| lastModifiedBy |
string |
The identity that last modified the resource. |
| lastModifiedByType |
The type of identity that last modified the resource. |
TargetUtilizationScaleSettings
| Name | Type | Default value | Description |
|---|---|---|---|
| maxInstances |
integer (int32) |
1 |
The maximum number of instances that the deployment can scale to. The quota will be reserved for max_instances. |
| minInstances |
integer (int32) |
1 |
The minimum number of instances to always be present. |
| pollingInterval |
string (duration) |
PT1S |
The polling interval in ISO 8691 format. Only supports duration with precision as low as Seconds. |
| scaleType |
string:
Target |
[Required] Type of deployment scaling algorithm |
|
| targetUtilizationPercentage |
integer (int32) |
70 |
Target CPU usage for the autoscaler. |
UserAssignedIdentity
User assigned identity properties
| Name | Type | Description |
|---|---|---|
| clientId |
string (uuid) |
The client ID of the assigned identity. |
| principalId |
string (uuid) |
The principal ID of the assigned identity. |