Azure online endpoint is Scaling taking long time

Tran Hong Thu (DPS.VI.DTS) 40 Reputation points
2025-02-13T07:38:15.4833333+00:00

I have a project to run an AI model using an online endpoint as a backend service, the endpoint is configured (manually set in the portal) to be auto-scale based on the number of requests.

Expect the endpoint to scale up between 1 to 2 minutes, like other services such as virtual machine scale set, etc...

But with the ML online endpoint, scaling takes a long time, about 12-18 minutes.

Do you have suggestions for speeding up the scaling time?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,340 questions
{count} votes

Accepted answer
  1. Saideep Anchuri 9,500 Reputation points Moderator
    2025-02-13T09:31:26.59+00:00

    Hi Tran Hong Thu (DPS.VI.DTS)

    Welcome to Microsoft Q&A Forum, thank you for posting your query here!

    Scaling times for Azure Machine Learning online endpoints can vary due to several factors, including the complexity of the model, the size of the resources being allocated, and the current load on the system. Scaling an online endpoint takes at least 5 minutes. For faster scaling, you can use a compute cluster for a batch endpoint, which allows for a scale-down time of less than a minute. Alternatively, you can use an AKS (Azure Kubernetes Service) cluster, which automatically adjusts based on incoming traffic, offering a more reactive scaling solution.

    az aks update \
      --resource-group <yourResourceGroup> \
      --name <yourAKSCluster> \
      --enable-cluster-autoscaler \
      --min-count <minNodeCount> \
      --max-count <maxNodeCount>
    
    

    To increase nodes based on workload azure-cli

    Kindly refer below link:

    kubernetes-online-endpoints

    how-to-attach-kubernetes

    machine-learning-reference

    Thank You.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.