Hello @Angelo D'Ambrosio , Thanks for using Microsoft Q&A Platform.
I can understand that you are experiencing high latency. Latency can be expected when using GPT-4 models due to their increased capacity compared to the GPT-3.5 version.
The Latency is primarily influenced by the model being used, and the number of tokens being generated such as the max_token
size.
Please kindly refer to this documentation and follow the best practices that help you to improve the Performance and latency.
Azure Diagnostics is a powerful tool that can help you to debug latency issues.
Please note that, Azure OpenAI does not have a specific Service Level Agreement (SLA) for latency.
What are the SLAs for API responses in Azure OpenAI?
We don't have a defined API response time Service Level Agreement (SLA) at this time. For more information about the SLA for Azure OpenAI Service, see the Service Level Agreements (SLA) for Online Services page.
I hope this helps.
Regards,
Vasavi
-Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.