1106-Preview gpt-4 in Canada East Region which is very slow at the moment.

Gokulraj A (LTIMINDTREE LIMITED) 40 Reputation points Microsoft External Staff
2024-03-21T04:59:20.77+00:00

We are using 1106-Preview gpt-4 in Canada East Region which is very slow at the moment. We were using 0613 which was way much faster than the one mentioned before. But we are actually looking to use 1106-Preview gpt-4 in Canada East Region for our developments. Please suggest or assist regarding speed issues.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,121 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,677 questions
0 comments No comments
{count} votes

Accepted answer
  1. navba-MSFT 27,550 Reputation points Microsoft Employee Moderator
    2024-03-21T05:47:15.47+00:00

    @Gokulraj A (LTIMINDTREE LIMITED) Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    Please note that the, Azure OpenAI does not have a specific Service Level Agreement (SLA) for latency. Instead, the SLA primarily focuses on availability, which is maintained at a 99.9% level. This means that the emphasis is on ensuring the service is accessible and operational rather than guaranteeing specific performance metrics.

    Suggestion 1:

    This article talks about how to optimize the latency performance issues.

    Suggestion 2:

    GPT-4 version 0125-preview is an updated version of the GPT-4 Turbo preview previously released as version 1106-preview. Customers may find that GPT-4-0125-preview generates more output compared to the gpt-4-1106-preview. GPT-4-0125-preview also addresses bugs in gpt-4-1106-preview.

    So my recommendation would be to make use of GPT-4 version 0125-preview model instead. However the region availability for this is eastus , northcentralus and southcentralus

    More info here.

    Suggestion 3:
    Try with isStreaming = TRUE This parameter reduces first response latency due to data coming in chunks. IsStreaming is a user-controlled parameter in the chat API call. Users should optimize this setting based on their requirements to improve the perceived performance. By enabling streaming, users can receive data faster, which may improve the user experience even if the overall latency remains unchanged. More info here.

    Suggestion 4:

    TokenConsumed = max_token Smaller token size leads to faster response. TokenConsumed is another user-controlled parameter in the chat API call. Users should carefully optimize token size (max_token) to find the right balance between response speed and data granularity. Adjusting this parameter can significantly impact the response time, so it is important to fine-tune it according to the specific use case and user requirements.

    Suggestion 5:

    If none of the above suggestions helps and if you still wish to troubleshoot this issue, since you are internal to Microsoft, I am sharing you the internal logs and its queries which you can run to identify the root cause.

    **

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.