(nodejs azure/openai SKD) Can't get "streamChatCompletions" to work properly. Chunks coming all at once after waiting for some time
Hello. I'm in the process of migrating from OpenAI to Azure/OpenAI. I have set up my Azure OpenAI GPT-4 deployment and tested that it works. I'm trying to finalize the implementation by enabling stream response. However, I am facing an issue that I do not know how to resolve. I have tried a few different implementations, suggested on various GitHub repositories, Stack Overflow, etc., even tried implementations suggested by GPT itself. But they all give the same behavior, so I am starting to suspect that it is not the implementations that are wrong but something else. Below you find an example of one of the attempted implementations:
let result = '';
const events = await client.streamChatCompletions(deploymentId, requestBody.messages);
const stream = new ReadableStream({
async start (controller) {
for await (const event of events) {
controller.enqueue(event);
}
controller.close();
}
});
const reader = stream.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) {
break;
}
for (const choice of value.choices) {
if (!choice.delta?.content) console.log('Debug 1', JSON.stringify(value, 2, null));
if (choice.delta?.content !== undefined) {
console.log('Chunk: ', choice.delta?.content);
result += choice.delta.content;
streamCallback(result);
}
}
}
return result;
When this code runs, I soon get a log from "Debug 1" that looks like following:
Debug 1 {
"id": "chatcmpl-9ACt3...",
"model": "gpt-4",
"object": "chat.completion.chunk",
"systemFingerprint": null,
"created": "1970-01-20T19:36:59.845Z",
"promptFilterResults": [],
"choices": [
{
"index": 0,
"finishReason": null,
"delta": {
"role": "assistant",
"toolCalls": []
},
"contentFilterResults": {}
}
]
}
Then nothing happens for a while (depending on the response length, 5-20 seconds), then I get a burst of stream events all at once, in total containing the full response of the assistant. And then I get the finalizing event closing the stream ("finishReason": "stop").
I would expect the chunks to come flowing in, from the beginning of the process to the end, not all at once after a long wait. I could paste in a few more code implementation examples that I have tried, but as I mentioned, I'm seeing this behavior on all of them. So, could it perhaps be an issue with my deployment? Am I missing turning on some setting? Could it be a bug in the SDK? Or could it be the endpoint that just doesn't return stream response as I expect?
I hope someone can shed a bit of light onto this issue.
Thanks 🙏