Azure GPT4o stream sends chunks at once in a short time with not good continuous flow.
We're using gpt4o for streaming service that smooth real-time response is important.
Both Azure and openAI sends well-broken chunks on gpt4o stream completion.
But, There is an issue in Azure.
openAI gpt4o sends chunks smoothly without stuck.
Azure gpt4o sends chunks multiple chunks at once in a short time. And, it stops sending between the responses for 1~2 seconds. So, it's not usable for real-time service like chatbot.
Please refer the attached image.
Even though I disabled the content filter, the problem still existed.
However, this issue was addressed by changing APIM tier (dev -> standard v2). The architecture is ‘Client->API Management->openAI’. The throughput (500 requests/sec) was enough and usage was less than 5%. I never thought that could be the cause.
I've addressed this problem by changing the apim tier, I wonder the reason.
It seems that the chunks responded to by openai were accumulated in APIM and sent at once, but I do not know the exact cause.