Azure ML - AKS Service deployment unable to handle concurrent requests despite auto scaling

Subhankar Nayak 1 Reputation point
2020-10-01T09:48:45.85+00:00

I have deployed around 23 models (amounting to 1.57 GB) in a Azure ML workspace using Azure Kubernetes Service. For the AKS cluster, I have used 3 D8sv3 nodes, and enabled cluster auto scaling for the cluster up to 6 nodes.
The AksWebService is configured with 4.4 cores, 16 GB memory. I have enabled pod auto scaling for the Web service, having set autoscale_max_replicas at 40:

aks_config = AksWebservice.deploy_configuration(cpu_cores = 4.4, memory_gb = 16, autoscale_enabled = True,  
                                                description = 'TEST - Configuration for Kubernetes Compute Target',  
                                                enable_app_insights = True, max_request_wait_time = 25000,  
                                                autoscale_target_utilization = 0.6, autoscale_max_replicas = 40)  

I tried running load tests with 10 concurrent users (using JMeter). And I monitored the cluster application insights:
29661-oct1-test1-cpu-mem.png
29663-oct1-test1-node-pod.png

I can see the nodes and pods scaling. However, there is no spike in CPU/memory utilization. For 10 concurrent requests, only 5 to 6 requests pass, the rest fail. When I send an individual request to the deployed endpoint, the response is generated in 7 to 9 seconds. However, in the load test logs, there are plenty requests taking more than 15 seconds to generate a response. And the requests taking more than 25 seconds, fail with status code 503. I increased the max_request_wait_time due to this reason, however, I don't understand why it would take so much time despite such amount of compute, and the dashboard shows that memory isn't even 30% utilized. Should I be changing the replica_max_concurrent_requests param? Or should I be increasing the autoscale_max_replicas even more? Concurrent requests load may sometimes reach 100 in production, is there any solution to this?

Will be grateful for any advice. Thanks.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,552 questions
Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,848 questions
{count} votes