Most of the chatbot are using LLM's (Azure OpenAI) as part of their solution.
My main concern is the perceived latency since it takes a while for an LLM to generate an answer.
Streaming the answer (like the ChatGPT and Copilot agents do) also takes time but at least the enduser has something to look at.
I managed to implement a working streaming solution using Bot Framework NodeJS Bot and the webchat client
The content bits (words in my case) are sent as event to the webchat client.
Because webchat is a premium channel, this solution is not feasible. According to the current pricing of Botservices, for every 1000 events (activity objects) you will be charged approx. 0,50 dollar\euro. This is on top of the OpenAI costs that can also be expensive.
I there a more cost efficient way to stream these bits to the frontend?