Share via

Azure ML Online Endpoint Billing and Cost Reduction Options

Issac Chan 70 Reputation points
2025-09-27T18:14:19.23+00:00

How does billing work for Azure ML online endpoints after deploying a model?
Does billing start immediately once the endpoint is deployed? If so, is there a way to pause or reduce costs when the endpoint is not in use — for example, by lowering the instance_count—instead of deleting and redeploying the endpoint each time?

Azure Machine Learning

Answer accepted by question author

Moritz Goeke 395 Reputation points MVP
2025-09-27T22:21:01.1833333+00:00

Hi Issac,

short answers:

  1. Yes, billing begins once a managed online endpoint deployment is running - you are billed continuously for the (VM) resources that are provisioned for that deployment.
  2. You can reduce costs by lowering instance_count, autoscaling, and downsizing - but you can not pause to zero without deleting or switching to a different deployment pattern.

Source: https://learn.microsoft.com/en-us/azure/machine-learning/concept-endpoints?view=azureml-api-2, search for "Scaling compute to zero".

Hope that helps, best regards! :)

Was this answer helpful?

1 person found this answer helpful.
0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.