What is a cost efficient way to stream OpenAI generated answers via Azure Bot Services to webchat agent

Question

What is a cost efficient way to stream OpenAI generated answers via Azure Bot Services to webchat agent

Hessel Wellema 256

Most of the chatbot are using LLM's (Azure OpenAI) as part of their solution.
My main concern is the perceived latency since it takes a while for an LLM to generate an answer.
Streaming the answer (like the ChatGPT and Copilot agents do) also takes time but at least the enduser has something to look at. I managed to implement a working streaming solution using Bot Framework NodeJS Bot and the webchat client
The content bits (words in my case) are sent as event to the webchat client.

Because webchat is a premium channel, this solution is not feasible. According to the current pricing of Botservices, for every 1000 events (activity objects) you will be charged approx. 0,50 dollar\euro. This is on top of the OpenAI costs that can also be expensive.

I there a more cost efficient way to stream these bits to the frontend?

romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator

2024-02-26T07:15:35.0633333+00:00

@Hessel Wellema I am not sure if this helps but do you think using an Azure OpenAI function call and then using the entire output as a response object helps in your case? See this page for Azure OpenAI function calling feature.

1 answer

Your answer

romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator

2024-02-26T07:15:35.0633333+00:00

@Hessel Wellema I am not sure if this helps but do you think using an Azure OpenAI function call and then using the entire output as a response object helps in your case? See this page for Azure OpenAI function calling feature.

Answer 1

Hessel Wellema 256

Hi @romungi-MSFT . tx but I am aware of function calling and already using it a lot. My concern is about the amount of http calls it takes to get the streaming bits via the premium channel to the webchat agent.

For now I decided to collect the bits until I have a full sentence and send that over. This is already an improvement and cheaper compared to sending words over.

Share via

What is a cost efficient way to stream OpenAI generated answers via Azure Bot Services to webchat agent

1 answer

Your answer