1106-Preview gpt-4 in Canada East Region which is very slow at the moment.

Question

1106-Preview gpt-4 in Canada East Region which is very slow at the moment.

Gokulraj A (LTIMINDTREE LIMITED) 40 Microsoft External Staff

We are using 1106-Preview gpt-4 in Canada East Region which is very slow at the moment. We were using 0613 which was way much faster than the one mentioned before. But we are actually looking to use 1106-Preview gpt-4 in Canada East Region for our developments. Please suggest or assist regarding speed issues.

Accepted answer

0 additional answers

Your answer

Answer 1

@Gokulraj A (LTIMINDTREE LIMITED) Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

Please note that the, Azure OpenAI does not have a specific Service Level Agreement (SLA) for latency. Instead, the SLA primarily focuses on availability, which is maintained at a 99.9% level. This means that the emphasis is on ensuring the service is accessible and operational rather than guaranteeing specific performance metrics.

Suggestion 1:

This article talks about how to optimize the latency performance issues.

Suggestion 2:

GPT-4 version 0125-preview is an updated version of the GPT-4 Turbo preview previously released as version 1106-preview. Customers may find that GPT-4-0125-preview generates more output compared to the gpt-4-1106-preview. GPT-4-0125-preview also addresses bugs in gpt-4-1106-preview.

So my recommendation would be to make use of GPT-4 version 0125-preview model instead. However the region availability for this is eastus , northcentralus and southcentralus

More info here.

Suggestion 3:
Try with isStreaming = TRUE This parameter reduces first response latency due to data coming in chunks. IsStreaming is a user-controlled parameter in the chat API call. Users should optimize this setting based on their requirements to improve the perceived performance. By enabling streaming, users can receive data faster, which may improve the user experience even if the overall latency remains unchanged. More info here.

Suggestion 4:

TokenConsumed = max_token Smaller token size leads to faster response. TokenConsumed is another user-controlled parameter in the chat API call. Users should carefully optimize token size (max_token) to find the right balance between response speed and data granularity. Adjusting this parameter can significantly impact the response time, so it is important to fine-tune it according to the specific use case and user requirements.

Suggestion 5:

If none of the above suggestions helps and if you still wish to troubleshoot this issue, since you are internal to Microsoft, I am sharing you the internal logs and its queries which you can run to identify the root cause.

**

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Share via

1106-Preview gpt-4 in Canada East Region which is very slow at the moment.

0 additional answers

Your answer