使用 gpt-4-1106-preview 模型,发送API请求 一般需要40~60 秒怎么解决

七猫AI小组 0 Reputation points
2024-01-11T05:39:14.49+00:00

请求方法:

modelDeploymentID := "gpt-4-1106-preview"
// This is a conversation in progress.
// NOTE: all messages, regardless of role, count against token usage for this API.
messages := []azopenai.ChatRequestMessageClassification{    &azopenai.ChatRequestSystemMessage{       Content: to.Ptr("您是一名资深影视编剧和爱情片故事分镜专家。在接下来的对话中,请始终牢记您的专业身份。现在,我将提供一段小说故事原文。基于您对人物刻画和情感视觉传达的深刻理解,以及故事分镜的专业技能,请转化原文内容,进行适当想象,但不要随意编撰不存在的事物,最终创造出精准的分镜描述。关注细节,如人物标签、肢体语言、场景设置等,以确保每个分镜都着重展示其核心元素。"),    },    &azopenai.ChatRequestUserMessage{Content: azopenai.NewChatRequestUserMessageContent("介绍一下自己")},}
resp, err := client.GetChatCompletions(context.TODO(), azopenai.ChatCompletionsOptions{   
 // This is a conversation in progress.    
// NOTE: all messages count against token usage for this API.   
 Messages:       messages,   
 DeploymentName: &modelDeploymentID,}, nil)

请求地址:中国上海 配置模型:

gpt-4-1106-preview

配置区域:瑞典中部 请求响应时长过长。40秒以上

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,101 questions
{count} votes

1 answer

Sort by: Most helpful
  1. navba-MSFT 27,550 Reputation points Microsoft Employee Moderator
    2024-01-11T06:07:55.1233333+00:00

    @七猫AI小组 Welcome to Microsoft Q&A Forum, Thank you for posting your query here!
    .
    Please check the latency metrics and let me know which API operation is consuming time. Open the Azure OpenAI resource from your portal. Navigate to the metrics section and apply the splitting for the latency metrics and check which API / operationName was time consuming ?

    User's image

    Also please check the Time to response metrics by applying splitting:

    User's image

    Note:
    If you are using GPT4 model then
    latency is expected considering that gpt-4 has more capacity than the gpt-3.5 version.
    As of now, we do not offer Service Level Agreements (SLAs) for response times from the Azure OpenAI service.
    .

    Action Plan:
    This article talks about Azure OpenAI service about improving the latency performance. Here are some of the best practices to lower latency:

    • Model latency: If model latency is important to you we recommend trying out our latest models in the GPT-3.5 Turbo model series.
    • Lower max tokens: OpenAI has found that even in cases where the total number of tokens generated is similar the request with the higher value set for the max token parameter will have more latency.
    • Lower total tokens generated: The fewer tokens generated the faster the overall response will be. Remember this is like having a for loop with n tokens = n iterations. Lower the number of tokens generated and overall response time will improve accordingly.
    • Streaming: Enabling streaming can be useful in managing user expectations in certain situations by allowing the user to see the model response as it is being generated rather than having to wait until the last token is ready.
    • Content Filtering improves safety, but it also impacts latency. Evaluate if any of your workloads would benefit from modified content filtering policies.

    Please let me know if you have any follow-up questions. I would be happy to answer it.
    . Awaiting your reply.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.