OpenAI gpt-3.5-turbo streaming responds slowly

Niu Sang 40 Reputation points
2023-07-05T07:54:26.03+00:00

When I request the API of Azure OpenAI gpt-3.5-turbo and set stream to true to get a streaming response, the result will be returned very slowly, and all the data will be returned suddenly after waiting for a few seconds. This is completely different from OpenAi's official API streaming return experience. Azure does not have the effect of OpenAI's official typewriter at all, and it is more like a non-streaming experience. Please how can I solve this problem? I think it may be that the nginx gateway of Azure OpenAI has not enabled SSE?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,115 questions
{count} votes

Accepted answer
  1. AshokPeddakotla-MSFT 35,976 Reputation points Moderator
    2023-07-06T05:08:28.8233333+00:00

    Niu Sang

    Could you share more details about the API request you're making? Specifically, what is the size of the payload you're sending?

    Can you try accessing the API from a different network or device to confirm that the issue is not related to the network or device?

    I suggest you, check Interacting with the model for Use the following practices for best results when chatting with the model.

    Also, Consider setting the following parameters even if they are optional for using the API.

    User's image

    Please note that content filtering and abuse monitoring features of Azure OpenAI still apply to the data.

    See similar thread which addressed the same issue here : https://learn.microsoft.com/en-us/answers/questions/1276363/problem-with-streaming-chat-api

    Hope this helps. Do let us know if that helps or have any further queries,

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.