Why does the Open AI API is very slow?

Dekel Yasso 0 Reputation points
2024-03-11T16:10:41.1733333+00:00

Hello,

I am using Azure open ai, and I did some tests before using the API.

The results from the online chat is a lot quicker then the API, with the same prompt.

All API requests are taking more then 60 seconds to return with an answer(66-72 seconds).
I am suspecting that there is a problem with the API that making it very slow.

Would really appreciate your help.

Thanks

Dekel

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,599 questions
{count} votes

1 answer

Sort by: Most helpful
  1. navba-MSFT 27,540 Reputation points Microsoft Employee Moderator
    2024-03-12T02:40:28.7366667+00:00

    @Dekel Yasso Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    If you are using GPT4 model then latency is expected considering that gpt-4 has more capacity than the gpt-3.5 version.

    As of now, we do not offer Service Level Agreements (SLAs) for response times from the Azure OpenAI service.

    Action Plan:

    This article talks about Azure OpenAI service about improving the latency performance. Here are some of the best practices to lower latency:

    • Model latency: If model latency is important to you we recommend trying out our latest models in the GPT-3.5 Turbo model series.
    • Lower max tokens: OpenAI has found that even in cases where the total number of tokens generated is similar the request with the higher value set for the max token parameter will have more latency.
    • Lower total tokens generated: The fewer tokens generated the faster the overall response will be. Remember this is like having a for loop with n tokens = n iterations. Lower the number of tokens generated and overall response time will improve accordingly.
    • Streaming: Enabling streaming can be useful in managing user expectations in certain situations by allowing the user to see the model response as it is being generated rather than having to wait until the last token is ready. User's image

    Please let me know if you have any follow-up questions. I would be happy to answer it. . Awaiting your reply.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.