How to send out the AzureOpen AI response in real-time streaming throught using HTTP response in Python?

Question

How to send out the Azure Open AI response in real-time streaming throught using HTTP response in Python?

Background:

The goal of using Azure OpenAI is to make an internal assistant chatbot by answering user inquires based on the file input.
The backend code to call the Azure Open AI gpt-35-16k model is hosted via Function App, and the backend uses langchain for file text embedding with indexing library Llama_index. The Function App is using the Production Pricing Plan Premium v3 P1V3.
Since it is hosted by Function App, the communication protocol between the backend and frontend is HTTPS

Example Code:

def main(req: func.HttpRequest) -> func.HttpResponse:

    logging.info('Python HTTP trigger function processed a request.')
    req_body = req.get_body()
    req_json = json.loads(req_body)
    question=req_json['question']

    prompt="""
    # my prompt here
    """

    # query_engine is the engine to send query to Azure OpenAI with reference to the file text indexing
    llamaIndexresponse = query_engine.query((prompt+question))    

    # sample output json
    result = {

        "response": str(llamaIndexresponse.response).strip()
    }

    
    json_string = json.dumps(result)

    if not question:
        try:
            req_body = req.get_json()
        except ValueError:
            pass
        else:
            question = req_body.get('question')

    if question:
        return func.HttpResponse(json_string)
    else:
        return func.HttpResponse(
             "This HTTP triggered function executed successfully. Pass a question in the query string or in the request body for a personalized response.",
             status_code=200

        )

The frontend can call the backend hosted by Function App with a restful post request API call, and then render the UI display
The ask is that how to have a real-time streaming to display the API response on the UI, which means that it involves two sets of API streaming:

a) The Azure OpenAI streaming (query engine response streaming)

b) The Function App backend streaming (HTTP streaming to have outbound responses to frontend from Function App)

Answer

Hello Eric

Thanks for reaching out to us, one way you may considerate to do it is SSE(server-sent events).

To send out the Azure Open AI response in real-time streaming through HTTP response in Python, you can use Server-Sent Events (SSE) to stream the response from the backend to the frontend. SSE is a technology that allows a server to send updates to the client in real-time.

Refence document you may need - https://learn.microsoft.com/en-us/azure/api-management/how-to-server-sent-events

Here's an example of how you can modify your code to use SSE, please be aware that this code is only sample code and you need to change it according to your requirements.

import json
import logging
import time

import azure.functions as func

def main(req: func.HttpRequest) -> func.HttpResponse:
    logging.info('Python HTTP trigger function processed a request.')
    req_body = req.get_body()
    req_json = json.loads(req_body)
    question = req_json['question']

    prompt = """
    # my prompt here
    """

    # query_engine is the engine to send query to Azure OpenAI with reference to the file text indexing
    llamaIndexresponse = query_engine.query((prompt+question))

    # Set the content type to text/event-stream
    headers = {
        'Content-Type': 'text/event-stream',
        'Cache-Control': 'no-cache',
        'Connection': 'keep-alive'
    }

    # Send the headers to the client
    response = func.HttpResponse(headers=headers)

    # Send the response to the client in real-time using SSE
    response.streaming = True

    # Send the initial SSE event to the client
    response.write('event: message
')
    response.write('data: {}

'.format(json.dumps({'response': 'Processing...'})))

    # Send the SSE events to the client
    while not llamaIndexresponse.done:
        response.write('event: message
')
        response.write('data: {}

'.format(json.dumps({'response': str(llamaIndexresponse.response).strip()})))
        time.sleep(1)

    # Send the final SSE event to the client
    response.write('event: message
')
    response.write('data: {}

'.format(json.dumps({'response': str(llamaIndexresponse.response).strip()})))

    return response

Please have a try and let me know how it works, I am happy to discuss with you further if it does not work for you.

Regards,

Yutong

-Please kindly accept the answer and vote 'Yes' if you feel you are getting help to support the community and help more people, thanks a lot.

Share via

How to send out the AzureOpen AI response in real-time streaming throught using HTTP response in Python?

1 answer

Your answer