How to reduce the latency of the ML model endpoint ?

Abhinav Jha 0 Reputation points
2023-06-07T11:20:56.6+00:00

Azure Machine Learning real-time endpoint taking a long time to respond, over 20 seconds. Developed an ML model and deployed it into an AKS cluster as a real-time endpoint in centralindia region, but the latency is very high.

Cluster config -

  1. Single node (4 core and 16 GB RAM).
  2. Region - centralindia

Logs -

Graph was finalized. Restoring parameters from studiomodelpackage/Resources/0/model/checkpoints/model.ckpt-30693 Running local_init_op. Done running local_init_op. 2023-06-07 11:08:29,183 studio.core          INFO       |   |   |   Generate predictions - End with 26.9079s elapsed. 2023-06-07 11:08:29,183 studio.core          INFO       |   |   |   Format prediction results - Start: 2023-06-07 11:08:29,187 studio.core          INFO       |   |   |   Format prediction results - End with 0.0037s elapsed. 2023-06-07 11:08:29,193 studio.core          INFO       |   |   Executing node 7: Score Wide and Deep Recommender - End with 27.4322s elapsed. 2023-06-07 11:08:29,194 studio.core          INFO       |   Processing - End with 27.4798s elapsed. 2023-06-07 11:08:29,194 studio.core          INFO       |   Post-processing - Start: 2023-06-07 11:08:29,194 studio.core          INFO       |   Post-processing - End with 0.0000s elapsed. 2023-06-07 11:08:29,194 studio.core          INFO       Handling http request - End with 27.4804s elapsed.
Azure Machine Learning
Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-9091 54,016 Reputation points Moderator
    2023-06-07T23:10:11.79+00:00

    Hello @Abhinav Jha

    Thanks for reaching out to us.

    There could be several reasons why your Azure Machine Learning real-time endpoint is taking a long time to respond. Here are a few things you can check:

    1. Check the size of your model: If your model is large, it may take longer to load and process. You can try optimizing your model by reducing its size or using techniques like quantization to reduce the number of parameters.
    2. Check the size of your input data: If your input data is large, it may take longer to process. You can try reducing the size of your input data or batching your requests to reduce the number of requests.
    3. Check the performance of your AKS cluster: If your AKS cluster is under-provisioned or experiencing high load, it may take longer to process requests. You can try scaling up your cluster or optimizing its configuration to improve performance.
    4. Check the network latency: If your client is located far away from the region where your AKS cluster is deployed, it may take longer for requests to travel over the network. You can try deploying your endpoint closer to your clients or using a content delivery network (CDN) to reduce network latency.
    5. Check the performance of your scoring script: If your scoring script is complex or inefficient, it may take longer to process requests. You can try optimizing your scoring script by using techniques like caching or pre-processing to reduce the amount of computation required.

    I hope this helps! Let me know if you have any further questions.

    Regards,

    Yutong

    -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.