Welcome to Microsoft Q&A Forum, thank you for posting your query here!
Scaling times for Azure Machine Learning online endpoints can vary due to several factors, including the complexity of the model, the size of the resources being allocated, and the current load on the system. Scaling an online endpoint takes at least 5 minutes. For faster scaling, you can use a compute cluster for a batch endpoint, which allows for a scale-down time of less than a minute. Alternatively, you can use an AKS (Azure Kubernetes Service) cluster, which automatically adjusts based on incoming traffic, offering a more reactive scaling solution.
az aks update \
--resource-group <yourResourceGroup> \
--name <yourAKSCluster> \
--enable-cluster-autoscaler \
--min-count <minNodeCount> \
--max-count <maxNodeCount>
To increase nodes based on workload azure-cli
Kindly refer below link:
Thank You.