How to send out the AzureOpen AI response in real-time streaming throught using HTTP response in Python?

Eric HK 15 Reputation points
2023-07-21T02:33:21.61+00:00

How to send out the Azure Open AI response in real-time streaming throught using HTTP response in Python?

Background:

  1. The goal of using Azure OpenAI is to make an internal assistant chatbot by answering user inquires based on the file input.
  2. The backend code to call the Azure Open AI gpt-35-16k model is hosted via Function App, and the backend uses langchain for file text embedding with indexing library Llama_index. The Function App is using the Production Pricing Plan Premium v3 P1V3.
  3. Since it is hosted by Function App, the communication protocol between the backend and frontend is HTTPS

Example Code:

def main(req: func.HttpRequest) -> func.HttpResponse:

    logging.info('Python HTTP trigger function processed a request.')
    req_body = req.get_body()
    req_json = json.loads(req_body)
    question=req_json['question']

    prompt="""
    # my prompt here
    """

    # query_engine is the engine to send query to Azure OpenAI with reference to the file text indexing
    llamaIndexresponse = query_engine.query((prompt+question))    

    # sample output json
    result = {

        "response": str(llamaIndexresponse.response).strip()
    }

    
    json_string = json.dumps(result)

    if not question:
        try:
            req_body = req.get_json()
        except ValueError:
            pass
        else:
            question = req_body.get('question')

    if question:
        return func.HttpResponse(json_string)
    else:
        return func.HttpResponse(
             "This HTTP triggered function executed successfully. Pass a question in the query string or in the request body for a personalized response.",
             status_code=200

        )
  1. The frontend can call the backend hosted by Function App with a restful post request API call, and then render the UI display
  2. The ask is that how to have a real-time streaming to display the API response on the UI, which means that it involves two sets of API streaming:

a) The Azure OpenAI streaming (query engine response streaming)

b) The Function App backend streaming (HTTP streaming to have outbound responses to frontend from Function App)

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
5,023 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,123 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 51,936 Reputation points
    2023-07-24T19:18:10.7433333+00:00

    Hello Eric

    Thanks for reaching out to us, one way you may considerate to do it is SSE(server-sent events).

    To send out the Azure Open AI response in real-time streaming through HTTP response in Python, you can use Server-Sent Events (SSE) to stream the response from the backend to the frontend. SSE is a technology that allows a server to send updates to the client in real-time.

    Refence document you may need - https://learn.microsoft.com/en-us/azure/api-management/how-to-server-sent-events

    Here's an example of how you can modify your code to use SSE, please be aware that this code is only sample code and you need to change it according to your requirements.

    import json
    import logging
    import time
    
    import azure.functions as func
    
    def main(req: func.HttpRequest) -> func.HttpResponse:
        logging.info('Python HTTP trigger function processed a request.')
        req_body = req.get_body()
        req_json = json.loads(req_body)
        question = req_json['question']
    
        prompt = """
        # my prompt here
        """
    
        # query_engine is the engine to send query to Azure OpenAI with reference to the file text indexing
        llamaIndexresponse = query_engine.query((prompt+question))
    
        # Set the content type to text/event-stream
        headers = {
            'Content-Type': 'text/event-stream',
            'Cache-Control': 'no-cache',
            'Connection': 'keep-alive'
        }
    
        # Send the headers to the client
        response = func.HttpResponse(headers=headers)
    
        # Send the response to the client in real-time using SSE
        response.streaming = True
    
        # Send the initial SSE event to the client
        response.write('event: message\n')
        response.write('data: {}\n\n'.format(json.dumps({'response': 'Processing...'})))
    
        # Send the SSE events to the client
        while not llamaIndexresponse.done:
            response.write('event: message\n')
            response.write('data: {}\n\n'.format(json.dumps({'response': str(llamaIndexresponse.response).strip()})))
            time.sleep(1)
    
        # Send the final SSE event to the client
        response.write('event: message\n')
        response.write('data: {}\n\n'.format(json.dumps({'response': str(llamaIndexresponse.response).strip()})))
    
        return response
    
    
    

    Please have a try and let me know how it works, I am happy to discuss with you further if it does not work for you.

    Regards,

    Yutong

    -Please kindly accept the answer and vote 'Yes' if you feel you are getting help to support the community and help more people, thanks a lot.

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.