Azure OpenAI Chat Completion in Function App Takes 16 Seconds How to Optimize?

Harinath J 265 Reputation points
2025-03-04T04:10:23.2433333+00:00

Hi Azure Community,

I am using the Azure OpenAI Chat Completion API to generate a carousel in my Azure Function App. However, when I hit the API endpoint, it takes around 16 seconds to respond.

Is there any issue with my code or prompt that could be causing this delay? I would appreciate any suggestions on how to speed up the process while maintaining good response quality. Thanks in advance for your help!

Here is my prompt:


   systprompt = f"""
    You are a multilingual assistant designed to process property details and convert them into a structured JSON object formatted specifically for Hero cards in the Microsoft Bot Framework.
    Convert the following property details into a structured JSON object containing a list of Hero cards:
    ### Requirements:
    1. TITLE FORMAT:
       - Always output titles in English only
       - If multiple unit types exist (e.g., "Residensi ZIG - Type A & Type A1"), split into separate cards
       - If title contains "Service Apartment", remove that text
       - If title includes "- Studio", remove the studio part    
    2. TEXT FORMAT:
       - Keep the detected language of the input
       - Use structured, bulleted format including:
         * 🏠 Unit Type (Optional)
         * 🛏️ Bedroom details (Optional)
         * 🏢 Amenities (Optional)
       - Leave sections empty if information isn't available (don't use "N/A")
    3. IMAGE:
       - Include image URL if available; otherwise, use property name or unit type
    4. BUTTON:
       - Title: Static as "More Details" or "View Pricing" (for financial content)
       - Value: Dynamic based on property name/type:
         * With unit type: "Explain more details about [Property Name - Type X]"
         * Without unit type: "Explain more details about [Property Name]"
    ### Formatting Rules:
    - Highlight bedroom details with emojis
    - Titles must always be in English, other content in input language
    - Keep responses concise (≤500 characters per card)
    
    ### Property Details:
    {text}
    
    The response must be a valid JSON object with the following structure:
    {{
        "cards": [
            {{
                "title": "Property Name - Type X",
                "text": "Structured bullet points with property details",
                "image": "Image URL or property identifier",
                "button": {{
                    "title": "More Details",
                    "type": "imBack",
                    "value": "Explain more details about Property Name - Type X"
                }}
            }}
        ]
    }}
    """

   	
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,080 questions
{count} votes

Accepted answer
  1. Diego Martos 85 Reputation points Microsoft Employee
    2025-03-04T13:23:38.2166667+00:00

    Hello Harinath,

    This is a very interesting topic and common issue on post POC's.

    It can be a combination of several things, Use of LLM vs SLM, complexity of the reference and source of data used to provide RAG Answers, and many others.

    Knowing the source of the data, where it resides and the complexity of the assessment of the data can potentially explain why a response takes 16 seconds.

    Also, pre-processing the data (e.g. Using Azure AI Search) to index and vectorize larger data sets, increases the overall cost, but increases both response time and accuracy.

    Are you using a static source of data located in a data storage, are you use web scrap or a list of external data sources? If yes, what is the size of this data, its format and more insights? What is the based data that you need to be analyzed by LLM?

    If you could provide a generic no-sensitive diagram showing the high-level architecture design could point to areas where you have a time-consuming process.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.