Azure open ai Latency : its very slow when we hit again

Question

we are using GPT -turbo 3.5 : model version -1106 ,

When we try to invoke the response from the model using the below command

first time response time is : 2.3s

Second time the response time is 54 s for the same prompt

 response = llm.stream(loaded_prompt)

Answer

Hello Vivek S,

This Microsoft Learn document provides official recommendations for improving latency.

For production environments that require stable maximum latency, the most effective approach currently is to use the PTU deployment type. However, this may incur corresponding costs.

Best regards,
Charlie

If you find my response helpful, please consider accepting this answer and voting yes to support the community. Thank you!

Share via

Azure open ai Latency : its very slow when we hit again

1 answer

Your answer