(nodejs azure/openai SKD) Can't get "streamChatCompletions" to work properly. Chunks coming all at once after waiting for some time

CultureDrivers 0 Reputation points
2024-04-04T08:46:21.9666667+00:00

Hello. I'm in the process of migrating from OpenAI to Azure/OpenAI. I have set up my Azure OpenAI GPT-4 deployment and tested that it works. I'm trying to finalize the implementation by enabling stream response. However, I am facing an issue that I do not know how to resolve. I have tried a few different implementations, suggested on various GitHub repositories, Stack Overflow, etc., even tried implementations suggested by GPT itself. But they all give the same behavior, so I am starting to suspect that it is not the implementations that are wrong but something else. Below you find an example of one of the attempted implementations:


let result = '';
    const events = await client.streamChatCompletions(deploymentId, requestBody.messages);
    const stream = new ReadableStream({
      async start (controller) {
        for await (const event of events) {
          controller.enqueue(event);
        }
        controller.close();
      }
    });
    const reader = stream.getReader();
    while (true) {
      const { done, value } = await reader.read();
      if (done) {
        break;
      }
      for (const choice of value.choices) {
        if (!choice.delta?.content) console.log('Debug 1', JSON.stringify(value, 2, null));
        if (choice.delta?.content !== undefined) {
          console.log('Chunk: ', choice.delta?.content);
          result += choice.delta.content;
          streamCallback(result);
        }
      }
    }
    return result;

When this code runs, I soon get a log from "Debug 1" that looks like following:

Debug 1 {
  "id": "chatcmpl-9ACt3...",
  "model": "gpt-4",
  "object": "chat.completion.chunk",
  "systemFingerprint": null,
  "created": "1970-01-20T19:36:59.845Z",
  "promptFilterResults": [],
  "choices": [
    {
      "index": 0,
      "finishReason": null,
      "delta": {
        "role": "assistant",
        "toolCalls": []
      },
      "contentFilterResults": {}
    }
  ]
}

Then nothing happens for a while (depending on the response length, 5-20 seconds), then I get a burst of stream events all at once, in total containing the full response of the assistant. And then I get the finalizing event closing the stream ("finishReason": "stop").

I would expect the chunks to come flowing in, from the beginning of the process to the end, not all at once after a long wait. I could paste in a few more code implementation examples that I have tried, but as I mentioned, I'm seeing this behavior on all of them. So, could it perhaps be an issue with my deployment? Am I missing turning on some setting? Could it be a bug in the SDK? Or could it be the endpoint that just doesn't return stream response as I expect?

I hope someone can shed a bit of light onto this issue.

Thanks 🙏

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,080 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.