What is a cost efficient way to stream OpenAI generated answers via Azure Bot Services to webchat agent

Hessel Wellema 186 Reputation points
2024-02-23T11:14:49+00:00

Most of the chatbot are using LLM's (Azure OpenAI) as part of their solution.
My main concern is the perceived latency since it takes a while for an LLM to generate an answer.
Streaming the answer (like the ChatGPT and Copilot agents do) also takes time but at least the enduser has something to look at. I managed to implement a working streaming solution using Bot Framework NodeJS Bot and the webchat client
The content bits (words in my case) are sent as event to the webchat client.

Because webchat is a premium channel, this solution is not feasible. According to the current pricing of Botservices, for every 1000 events (activity objects) you will be charged approx. 0,50 dollar\euro. This is on top of the OpenAI costs that can also be expensive.

I there a more cost efficient way to stream these bits to the frontend?

Azure AI Bot Service
Azure AI Bot Service
An Azure service that provides an integrated environment for bot development.
749 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Hessel Wellema 186 Reputation points
    2024-02-27T08:37:41.7566667+00:00

    Hi @romungi-MSFT . tx but I am aware of function calling and already using it a lot. My concern is about the amount of http calls it takes to get the streaming bits via the premium channel to the webchat agent.

    For now I decided to collect the bits until I have a full sentence and send that over. This is already an improvement and cheaper compared to sending words over.

    1 person found this answer helpful.
    0 comments No comments