GPT-4-turbo via Azure is 4-5 times slower compared to OpenAi model

Question

Hello,

I'm sending a prompt of 6435 token testing gpt-4-turno 1106-preview model from both Azure and OpenAi.
The OpenAi model took 32.433sec to produce 462 tokens, while the Azure one took 154.064 per 443 tokens.
It's almost 5 times slower!!

What could be the reason?

Answer

Hello @Angelo D'Ambrosio , Thanks for using Microsoft Q&A Platform.

I can understand that you are experiencing high latency. Latency can be expected when using GPT-4 models due to their increased capacity compared to the GPT-3.5 version.

The Latency is primarily influenced by the model being used, and the number of tokens being generated such as the max_token size.

Please kindly refer to this documentation and follow the best practices that help you to improve the Performance and latency.

Azure Diagnostics is a powerful tool that can help you to debug latency issues.

Please note that, Azure OpenAI does not have a specific Service Level Agreement (SLA) for latency.

What are the SLAs for API responses in Azure OpenAI?
We don't have a defined API response time Service Level Agreement (SLA) at this time. For more information about the SLA for Azure OpenAI Service, see the Service Level Agreements (SLA) for Online Services page.

I hope this helps.

Regards,
Vasavi

-Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.

GPT-4-turbo via Azure is 4-5 times slower compared to OpenAi model

1 answer