Share via


Online Deployments - List

List Inference Endpoint Deployments.

GET https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.MachineLearningServices/workspaces/{workspaceName}/onlineEndpoints/{endpointName}/deployments?api-version=2025-12-01
GET https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.MachineLearningServices/workspaces/{workspaceName}/onlineEndpoints/{endpointName}/deployments?api-version=2025-12-01&$orderBy={$orderBy}&$top={$top}&$skip={$skip}

URI Parameters

Name In Required Type Description
endpointName
path True

string

Online Endpoint name.

resourceGroupName
path True

string

minLength: 1
maxLength: 90

The name of the resource group. The name is case insensitive.

subscriptionId
path True

string

minLength: 1

The ID of the target subscription.

workspaceName
path True

string

pattern: ^[a-zA-Z0-9][a-zA-Z0-9_-]{2,32}$

Azure Machine Learning Workspace Name

api-version
query True

string

minLength: 1

The API version to use for this operation.

$orderBy
query

string

Ordering of list.

$skip
query

string

Continuation token for pagination.

$top
query

integer (int32)

Top of list.

Responses

Name Type Description
200 OK

OnlineDeploymentTrackedResourceArmPaginatedResult

Azure operation completed successfully.

Other Status Codes

ErrorResponse

An unexpected error response.

Security

azure_auth

Azure Active Directory OAuth2 Flow.

Type: oauth2
Flow: implicit
Authorization URL: https://login.microsoftonline.com/common/oauth2/authorize

Scopes

Name Description
user_impersonation impersonate your user account

Examples

List Online Deployments.

Sample request

GET https://management.azure.com/subscriptions/00000000-1111-2222-3333-444444444444/resourceGroups/test-rg/providers/Microsoft.MachineLearningServices/workspaces/my-aml-workspace/onlineEndpoints/testEndpointName/deployments?api-version=2025-12-01&$orderBy=string&$top=1

Sample response

{
  "nextLink": "https://management.azure.com/subscriptions/34adfa4f-cedf-4dc0-ba29-b6d1a69ab345/resourceGroups/testrg123/providers/Microsoft.MachineLearningServices/workspaces/my-aml-workspace/onlineEndpoints/testEndpointName/deployments?api-version=2025-07-01-preview&$skip=2",
  "value": [
    {
      "name": "string",
      "type": "string",
      "id": "string",
      "identity": {
        "type": "SystemAssigned",
        "principalId": "00000000-1111-2222-3333-444444444444",
        "tenantId": "00000000-1111-2222-3333-444444444444",
        "userAssignedIdentities": {
          "string": {
            "clientId": "00000000-1111-2222-3333-444444444444",
            "principalId": "00000000-1111-2222-3333-444444444444"
          }
        }
      },
      "kind": "string",
      "location": "string",
      "properties": {
        "description": "string",
        "appInsightsEnabled": false,
        "codeConfiguration": {
          "codeId": "string",
          "scoringScript": "string"
        },
        "containerResourceRequirements": {
          "containerResourceLimits": {
            "cpu": "\"1\"",
            "gpu": "\"1\"",
            "memory": "\"2Gi\""
          },
          "containerResourceRequests": {
            "cpu": "\"1\"",
            "gpu": "\"1\"",
            "memory": "\"2Gi\""
          }
        },
        "endpointComputeType": "Kubernetes",
        "environmentId": "string",
        "environmentVariables": {
          "string": "string"
        },
        "instanceType": "string",
        "livenessProbe": {
          "failureThreshold": 1,
          "initialDelay": "PT5M",
          "period": "PT5M",
          "successThreshold": 1,
          "timeout": "PT5M"
        },
        "model": "string",
        "modelMountPath": "string",
        "properties": {
          "string": "string"
        },
        "provisioningState": "Creating",
        "requestSettings": {
          "maxConcurrentRequestsPerInstance": 1,
          "maxQueueWait": "PT5M",
          "requestTimeout": "PT5M"
        },
        "scaleSettings": {
          "scaleType": "Default"
        }
      },
      "sku": {
        "name": "string",
        "capacity": 1,
        "family": "string",
        "size": "string",
        "tier": "Free"
      },
      "systemData": {
        "createdAt": "2020-01-01T12:34:56.999Z",
        "createdBy": "string",
        "createdByType": "User",
        "lastModifiedAt": "2020-01-01T12:34:56.999Z",
        "lastModifiedBy": "string",
        "lastModifiedByType": "User"
      },
      "tags": {}
    }
  ]
}

Definitions

Name Description
CodeConfiguration

Configuration for a scoring code asset.

Collection
ContainerResourceRequirements

Resource requirements for each container instance within an online deployment.

ContainerResourceSettings
createdByType

The type of identity that created the resource.

DataCollectionMode

Enable or disable data collection.

DataCollector
DefaultScaleSettings
DeploymentProvisioningState

Possible values for DeploymentProvisioningState.

EgressPublicNetworkAccessType

Enum to determine whether PublicNetworkAccess is Enabled or Disabled for egress of a deployment.

EndpointComputeType

Enum to determine endpoint compute type.

ErrorAdditionalInfo

The resource management error additional info.

ErrorDetail

The error detail.

ErrorResponse

Error response

KubernetesOnlineDeployment

Properties specific to a KubernetesOnlineDeployment.

ManagedOnlineDeployment

Properties specific to a ManagedOnlineDeployment.

ManagedServiceIdentity

Managed service identity (system assigned and/or user assigned identities)

ManagedServiceIdentityType

Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed).

OnlineDeployment

Concrete tracked resource types can be created by aliasing this type using a specific property type.

OnlineDeploymentTrackedResourceArmPaginatedResult

A paginated list of OnlineDeployment entities.

OnlineRequestSettings

Online deployment scoring requests configuration.

ProbeSettings

Deployment container liveness/readiness probe configuration.

RequestLogging
RollingRateType

When model data is collected to blob storage, we need to roll the data to different path to avoid logging all of them in a single blob file. If the rolling rate is hour, all data will be collected in the blob path /yyyy/MM/dd/HH/. If it's day, all data will be collected in blob path /yyyy/MM/dd/. The other benefit of rolling path is that model monitoring ui is able to select a time range of data very quickly.

ScaleType
Sku

The resource model definition representing SKU

SkuTier

This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT.

systemData

Metadata pertaining to creation and last modification of the resource.

TargetUtilizationScaleSettings
UserAssignedIdentity

User assigned identity properties

CodeConfiguration

Configuration for a scoring code asset.

Name Type Description
codeId

string

ARM resource ID of the code asset.

scoringScript

string

minLength: 1
pattern: [a-zA-Z0-9_]

[Required] The script to execute on startup. eg. "score.py"

Collection

Name Type Default value Description
clientId

string

The msi client id used to collect logging to blob storage. If it's null,backend will pick a registered endpoint identity to auth.

dataCollectionMode

DataCollectionMode

Disabled

Enable or disable data collection.

dataId

string

The data asset arm resource id. Client side will ensure data asset is pointing to the blob storage, and backend will collect data to the blob storage.

samplingRate

number (double)

1

The sampling rate for collection. Sampling rate 1.0 means we collect 100% of data by default.

ContainerResourceRequirements

Resource requirements for each container instance within an online deployment.

Name Type Description
containerResourceLimits

ContainerResourceSettings

Container resource limit info:

containerResourceRequests

ContainerResourceSettings

Container resource request info:

ContainerResourceSettings

Name Type Description
cpu

string

Number of vCPUs request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/

gpu

string

Number of Nvidia GPU cards request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/

memory

string

Memory size request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/

createdByType

The type of identity that created the resource.

Value Description
User
Application
ManagedIdentity
Key

DataCollectionMode

Enable or disable data collection.

Value Description
Enabled
Disabled

DataCollector

Name Type Default value Description
collections

<string,  Collection>

[Required] The collection configuration. Each collection has it own configuration to collect model data and the name of collection can be arbitrary string. Model data collector can be used for either payload logging or custom logging or both of them. Collection request and response are reserved for payload logging, others are for custom logging.

requestLogging

RequestLogging

The request logging configuration for mdc, it includes advanced logging settings for all collections. It's optional.

rollingRate

RollingRateType

Hour

When model data is collected to blob storage, we need to roll the data to different path to avoid logging all of them in a single blob file. If the rolling rate is hour, all data will be collected in the blob path /yyyy/MM/dd/HH/. If it's day, all data will be collected in blob path /yyyy/MM/dd/. The other benefit of rolling path is that model monitoring ui is able to select a time range of data very quickly.

DefaultScaleSettings

Name Type Description
scaleType string:

Default

[Required] Type of deployment scaling algorithm

DeploymentProvisioningState

Possible values for DeploymentProvisioningState.

Value Description
Creating
Deleting
Scaling
Updating
Succeeded
Failed
Canceled

EgressPublicNetworkAccessType

Enum to determine whether PublicNetworkAccess is Enabled or Disabled for egress of a deployment.

Value Description
Enabled
Disabled

EndpointComputeType

Enum to determine endpoint compute type.

Value Description
Managed
Kubernetes
AzureMLCompute

ErrorAdditionalInfo

The resource management error additional info.

Name Type Description
info

object

The additional info.

type

string

The additional info type.

ErrorDetail

The error detail.

Name Type Description
additionalInfo

ErrorAdditionalInfo[]

The error additional info.

code

string

The error code.

details

ErrorDetail[]

The error details.

message

string

The error message.

target

string

The error target.

ErrorResponse

Error response

Name Type Description
error

ErrorDetail

The error object.

KubernetesOnlineDeployment

Properties specific to a KubernetesOnlineDeployment.

Name Type Default value Description
appInsightsEnabled

boolean

False

If true, enables Application Insights logging.

codeConfiguration

CodeConfiguration

Code configuration for the endpoint deployment.

containerResourceRequirements

ContainerResourceRequirements

The resource requirements for the container (cpu and memory).

dataCollector

DataCollector

The mdc configuration, we disable mdc when it's null.

description

string

Description of the endpoint deployment.

egressPublicNetworkAccess

EgressPublicNetworkAccessType

Enabled

Enum to determine whether PublicNetworkAccess is Enabled or Disabled for egress of a deployment.

endpointComputeType string:

Kubernetes

[Required] The compute type of the endpoint.

environmentId

string

ARM resource ID or AssetId of the environment specification for the endpoint deployment.

environmentVariables

object

Environment variables configuration for the deployment.

instanceType

string

Standard_F4s_v2

Compute instance type. Default: Standard_F4s_v2.

livenessProbe

ProbeSettings

Liveness probe monitors the health of the container regularly.

model

string

The URI path to the model.

modelMountPath

string

The path to mount the model in custom container.

properties

object

Property dictionary. Properties can be added, but not removed or altered.

provisioningState

DeploymentProvisioningState

Provisioning state for the endpoint deployment.

readinessProbe

ProbeSettings

Readiness probe validates if the container is ready to serve traffic. The properties and defaults are the same as liveness probe.

requestSettings

OnlineRequestSettings

Request settings for the deployment.

scaleSettings OnlineScaleSettings:

Scale settings for the deployment. If it is null or not provided, it defaults to TargetUtilizationScaleSettings for KubernetesOnlineDeployment and to DefaultScaleSettings for ManagedOnlineDeployment.

startupProbe

ProbeSettings

Startup probe verify whether an application within a container has started successfully.

ManagedOnlineDeployment

Properties specific to a ManagedOnlineDeployment.

Name Type Default value Description
appInsightsEnabled

boolean

False

If true, enables Application Insights logging.

codeConfiguration

CodeConfiguration

Code configuration for the endpoint deployment.

dataCollector

DataCollector

The mdc configuration, we disable mdc when it's null.

description

string

Description of the endpoint deployment.

egressPublicNetworkAccess

EgressPublicNetworkAccessType

Enabled

Enum to determine whether PublicNetworkAccess is Enabled or Disabled for egress of a deployment.

endpointComputeType string:

Managed

[Required] The compute type of the endpoint.

environmentId

string

ARM resource ID or AssetId of the environment specification for the endpoint deployment.

environmentVariables

object

Environment variables configuration for the deployment.

instanceType

string

Standard_F4s_v2

Compute instance type. Default: Standard_F4s_v2.

livenessProbe

ProbeSettings

Liveness probe monitors the health of the container regularly.

model

string

The URI path to the model.

modelMountPath

string

The path to mount the model in custom container.

properties

object

Property dictionary. Properties can be added, but not removed or altered.

provisioningState

DeploymentProvisioningState

Provisioning state for the endpoint deployment.

readinessProbe

ProbeSettings

Readiness probe validates if the container is ready to serve traffic. The properties and defaults are the same as liveness probe.

requestSettings

OnlineRequestSettings

Request settings for the deployment.

scaleSettings OnlineScaleSettings:

Scale settings for the deployment. If it is null or not provided, it defaults to TargetUtilizationScaleSettings for KubernetesOnlineDeployment and to DefaultScaleSettings for ManagedOnlineDeployment.

startupProbe

ProbeSettings

Startup probe verify whether an application within a container has started successfully.

ManagedServiceIdentity

Managed service identity (system assigned and/or user assigned identities)

Name Type Description
principalId

string (uuid)

The service principal ID of the system assigned identity. This property will only be provided for a system assigned identity.

tenantId

string (uuid)

The tenant ID of the system assigned identity. This property will only be provided for a system assigned identity.

type

ManagedServiceIdentityType

Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed).

userAssignedIdentities

<string,  UserAssignedIdentity>

User-Assigned Identities
The set of user assigned identities associated with the resource. The userAssignedIdentities dictionary keys will be ARM resource ids in the form: '/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName}. The dictionary values can be empty objects ({}) in requests.

ManagedServiceIdentityType

Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed).

Value Description
None
SystemAssigned
UserAssigned
SystemAssigned,UserAssigned

OnlineDeployment

Concrete tracked resource types can be created by aliasing this type using a specific property type.

Name Type Description
id

string

Fully qualified resource ID for the resource. Ex - /subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/{resourceProviderNamespace}/{resourceType}/{resourceName}

identity

ManagedServiceIdentity

Managed service identity (system assigned and/or user assigned identities)

kind

string

Metadata used by portal/tooling/etc to render different UX experiences for resources of the same type.

location

string

The geo-location where the resource lives

name

string

The name of the resource

properties OnlineDeploymentProperties:

[Required] Additional attributes of the entity.

sku

Sku

Sku details required for ARM contract for Autoscaling.

systemData

systemData

Azure Resource Manager metadata containing createdBy and modifiedBy information.

tags

object

Resource tags.

type

string

The type of the resource. E.g. "Microsoft.Compute/virtualMachines" or "Microsoft.Storage/storageAccounts"

OnlineDeploymentTrackedResourceArmPaginatedResult

A paginated list of OnlineDeployment entities.

Name Type Description
nextLink

string (uri)

The link to the next page of items

value

OnlineDeployment[]

The OnlineDeployment items on this page

OnlineRequestSettings

Online deployment scoring requests configuration.

Name Type Default value Description
maxConcurrentRequestsPerInstance

integer (int32)

1

The number of maximum concurrent requests per node allowed per deployment. Defaults to 1.

maxQueueWait

string (duration)

PT0.5S

(Deprecated for Managed Online Endpoints) The maximum amount of time a request will stay in the queue in ISO 8601 format. Defaults to 500ms. (Now increase request_timeout_ms to account for any networking/queue delays)

requestTimeout

string (duration)

PT5S

The scoring timeout in ISO 8601 format. Defaults to 5000ms.

ProbeSettings

Deployment container liveness/readiness probe configuration.

Name Type Default value Description
failureThreshold

integer (int32)

30

The number of failures to allow before returning an unhealthy status.

initialDelay

string (duration)

The delay before the first probe in ISO 8601 format.

period

string (duration)

PT10S

The length of time between probes in ISO 8601 format.

successThreshold

integer (int32)

1

The number of successful probes before returning a healthy status.

timeout

string (duration)

PT2S

The probe timeout in ISO 8601 format.

RequestLogging

Name Type Description
captureHeaders

string[]

For payload logging, we only collect payload by default. If customers also want to collect the specified headers, they can set them in captureHeaders so that backend will collect those headers along with payload.

RollingRateType

When model data is collected to blob storage, we need to roll the data to different path to avoid logging all of them in a single blob file. If the rolling rate is hour, all data will be collected in the blob path /yyyy/MM/dd/HH/. If it's day, all data will be collected in blob path /yyyy/MM/dd/. The other benefit of rolling path is that model monitoring ui is able to select a time range of data very quickly.

Value Description
Year
Month
Day
Hour
Minute

ScaleType

Value Description
Default
TargetUtilization

Sku

The resource model definition representing SKU

Name Type Description
capacity

integer (int32)

If the SKU supports scale out/in then the capacity integer should be included. If scale out/in is not possible for the resource this may be omitted.

family

string

If the service has different generations of hardware, for the same SKU, then that can be captured here.

name

string

The name of the SKU. Ex - P3. It is typically a letter+number code

size

string

The SKU size. When the name field is the combination of tier and some other value, this would be the standalone code.

tier

SkuTier

This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT.

SkuTier

This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT.

Value Description
Free
Basic
Standard
Premium

systemData

Metadata pertaining to creation and last modification of the resource.

Name Type Description
createdAt

string (date-time)

The timestamp of resource creation (UTC).

createdBy

string

The identity that created the resource.

createdByType

createdByType

The type of identity that created the resource.

lastModifiedAt

string (date-time)

The timestamp of resource last modification (UTC)

lastModifiedBy

string

The identity that last modified the resource.

lastModifiedByType

createdByType

The type of identity that last modified the resource.

TargetUtilizationScaleSettings

Name Type Default value Description
maxInstances

integer (int32)

1

The maximum number of instances that the deployment can scale to. The quota will be reserved for max_instances.

minInstances

integer (int32)

1

The minimum number of instances to always be present.

pollingInterval

string (duration)

PT1S

The polling interval in ISO 8691 format. Only supports duration with precision as low as Seconds.

scaleType string:

TargetUtilization

[Required] Type of deployment scaling algorithm

targetUtilizationPercentage

integer (int32)

70

Target CPU usage for the autoscaler.

UserAssignedIdentity

User assigned identity properties

Name Type Description
clientId

string (uuid)

The client ID of the assigned identity.

principalId

string (uuid)

The principal ID of the assigned identity.