Azure GPT4o stream sends chunks at once in a short time with not good continuous flow.

김세형 25 Reputation points
2024-06-12T07:33:17.73+00:00

Azure vs OpenAI

We're using gpt4o for streaming service that smooth real-time response is important.

Both Azure and openAI sends well-broken chunks on gpt4o stream completion.

But, There is an issue in Azure.

openAI gpt4o sends chunks smoothly without stuck.

Azure gpt4o sends chunks multiple chunks at once in a short time. And, it stops sending between the responses for 1~2 seconds. So, it's not usable for real-time service like chatbot.

Please refer the attached image.


Even though I disabled the content filter, the problem still existed.

However, this issue was addressed by changing APIM tier (dev -> standard v2). The architecture is ‘Client->API Management->openAI’. The throughput (500 requests/sec) was enough and usage was less than 5%. I never thought that could be the cause.

I've addressed this problem by changing the apim tier, I wonder the reason.

It seems that the chunks responded to by openai were accumulated in APIM and sent at once, but I do not know the exact cause.

Azure API Management
Azure API Management
An Azure service that provides a hybrid, multi-cloud management platform for APIs.
1,931 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,583 questions
{count} votes