Azure open ai Latency : its very slow when we hit again

Vivek S 0 Reputation points
2024-05-24T05:54:02.08+00:00

we are using GPT -turbo 3.5 : model version -1106 ,

When we try to invoke the response from the model using the below command

first time response time is : 2.3s

Second time the response time is 54 s for the same prompt

 response = llm.stream(loaded_prompt)
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,475 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Charlie Wei 3,310 Reputation points
    2024-05-24T07:03:51.2933333+00:00

    Hello Vivek S,

    This Microsoft Learn document provides official recommendations for improving latency.

    For production environments that require stable maximum latency, the most effective approach currently is to use the PTU deployment type. However, this may incur corresponding costs.

    Best regards,
    Charlie


    If you find my response helpful, please consider accepting this answer and voting yes to support the community. Thank you!

    0 comments No comments