Azure OpenAI Chat Completion in Function App Takes 16 Seconds How to Optimize?

Question

Azure OpenAI Chat Completion in Function App Takes 16 Seconds How to Optimize?

Harinath J 265

Hi Azure Community,

I am using the Azure OpenAI Chat Completion API to generate a carousel in my Azure Function App. However, when I hit the API endpoint, it takes around 16 seconds to respond.

Is there any issue with my code or prompt that could be causing this delay? I would appreciate any suggestions on how to speed up the process while maintaining good response quality. Thanks in advance for your help!

Here is my prompt:


   systprompt = f"""
    You are a multilingual assistant designed to process property details and convert them into a structured JSON object formatted specifically for Hero cards in the Microsoft Bot Framework.
    Convert the following property details into a structured JSON object containing a list of Hero cards:
    ### Requirements:
    1. TITLE FORMAT:
       - Always output titles in English only
       - If multiple unit types exist (e.g., "Residensi ZIG - Type A & Type A1"), split into separate cards
       - If title contains "Service Apartment", remove that text
       - If title includes "- Studio", remove the studio part    
    2. TEXT FORMAT:
       - Keep the detected language of the input
       - Use structured, bulleted format including:
         * 🏠 Unit Type (Optional)
         * 🛏️ Bedroom details (Optional)
         * 🏢 Amenities (Optional)
       - Leave sections empty if information isn't available (don't use "N/A")
    3. IMAGE:
       - Include image URL if available; otherwise, use property name or unit type
    4. BUTTON:
       - Title: Static as "More Details" or "View Pricing" (for financial content)
       - Value: Dynamic based on property name/type:
         * With unit type: "Explain more details about [Property Name - Type X]"
         * Without unit type: "Explain more details about [Property Name]"
    ### Formatting Rules:
    - Highlight bedroom details with emojis
    - Titles must always be in English, other content in input language
    - Keep responses concise (≤500 characters per card)
    
    ### Property Details:
    {text}
    
    The response must be a valid JSON object with the following structure:
    {{
        "cards": [
            {{
                "title": "Property Name - Type X",
                "text": "Structured bullet points with property details",
                "image": "Image URL or property identifier",
                "button": {{
                    "title": "More Details",
                    "type": "imBack",
                    "value": "Explain more details about Property Name - Type X"
                }}
            }}
        ]
    }}
    """

SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-03-04T07:11:44.99+00:00
Hello Harinath J,

Greetings and Welcome to Microsoft Q&A!

I understand that you're experiencing latency issues with your Azure OpenAI Chat Completion API in your Azure Function App. Here are some steps to help you optimize the performance:

Optimizing key parameters and configurations. A high max_tokens value increases response time, so setting a lower value helps generate responses faster.

Using a lower temperature (e.g., 0.2) ensures more deterministic and quicker outputs. Instead of waiting for the entire response.

Keep functions warm using a timer trigger to ping the function every few minutes.

Enabling response streaming allows partial results to be received in real-time, enhancing user experience.

Additionally, optimizing Azure Function performance by selecting the right hosting plan and ensuring efficient function execution can further reduce delays.

Fine-tuning API parameters and monitoring performance metrics will help maintain an optimal balance between speed and response quality.

Optimize Function Execution

Use Asynchronous Requests in your Function App:

import asyncio import openai async def get_completion(): response = await openai.ChatCompletion.acreate( model="gpt-3.5-turbo", messages=[{"role": "system", "content": systprompt}, {"role": "user", "content": text}], temperature=0.2, max_tokens=300 ) return response response = asyncio.run(get_completion())

Kindly refer this Azure Functions best practices.

Improve Azure Functions performance and reliability

Azure OpenAI Service performance & latency - Azure OpenAI

I Hope this helps. Do let me know if you have any further queries.

Thank you!

Harinath J 265

Thank you, SriLakshmi C.

I am currently using this way, is this correct way

completion = openai_client.chat.completions.create(
            model=os.environ["CHAT_COMPLETIONS_DEPLOYMENT_NAME"],
            messages=[
                {"role": "system", "content": system_prompt},
                {
                    "role": "user",
                    "content": User_message,
                }
            ],
            temperature=0.2,
            max_completion_tokens=400,
        )

SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-03-05T11:21:37.98+00:00

Hi Harinath J,

The above syntax is fine but implement asynchronous requests in your Function App.

Accepted answer

0 additional answers

Your answer

SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-03-04T07:11:44.99+00:00

Hello Harinath J,

Greetings and Welcome to Microsoft Q&A!

I understand that you're experiencing latency issues with your Azure OpenAI Chat Completion API in your Azure Function App. Here are some steps to help you optimize the performance:

Optimizing key parameters and configurations. A high max_tokens value increases response time, so setting a lower value helps generate responses faster.

Using a lower temperature (e.g., 0.2) ensures more deterministic and quicker outputs. Instead of waiting for the entire response.

Keep functions warm using a timer trigger to ping the function every few minutes.

Enabling response streaming allows partial results to be received in real-time, enhancing user experience.

Additionally, optimizing Azure Function performance by selecting the right hosting plan and ensuring efficient function execution can further reduce delays.

Fine-tuning API parameters and monitoring performance metrics will help maintain an optimal balance between speed and response quality.

Optimize Function Execution

Use Asynchronous Requests in your Function App:

import asyncio import openai async def get_completion(): response = await openai.ChatCompletion.acreate( model="gpt-3.5-turbo", messages=[{"role": "system", "content": systprompt}, {"role": "user", "content": text}], temperature=0.2, max_tokens=300 ) return response response = asyncio.run(get_completion())

Kindly refer this Azure Functions best practices.

Improve Azure Functions performance and reliability

Azure OpenAI Service performance & latency - Azure OpenAI

I Hope this helps. Do let me know if you have any further queries.

Thank you!
Harinath J 265 Reputation points

2025-03-04T11:02:36.48+00:00

Thank you, SriLakshmi C.

I am currently using this way, is this correct way

completion = openai_client.chat.completions.create( model=os.environ["CHAT_COMPLETIONS_DEPLOYMENT_NAME"], messages=[ {"role": "system", "content": system_prompt}, { "role": "user", "content": User_message, } ], temperature=0.2, max_completion_tokens=400, )
SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator

2025-03-05T11:21:37.98+00:00

Hi Harinath J,

The above syntax is fine but implement asynchronous requests in your Function App.

Answer 1

Diego Martos 85 Microsoft Employee

Hello Harinath,

This is a very interesting topic and common issue on post POC's.

It can be a combination of several things, Use of LLM vs SLM, complexity of the reference and source of data used to provide RAG Answers, and many others.

Knowing the source of the data, where it resides and the complexity of the assessment of the data can potentially explain why a response takes 16 seconds.

Also, pre-processing the data (e.g. Using Azure AI Search) to index and vectorize larger data sets, increases the overall cost, but increases both response time and accuracy.

Are you using a static source of data located in a data storage, are you use web scrap or a list of external data sources? If yes, what is the size of this data, its format and more insights? What is the based data that you need to be analyzed by LLM?

If you could provide a generic no-sensitive diagram showing the high-level architecture design could point to areas where you have a time-consuming process.

Diego Martos 85 Reputation points Microsoft Employee

2025-03-05T16:08:45.9+00:00

Also worth to check this collection series that goes from zero to hero and seems to be related with possible bottle necks to check and rule out, or perhaps gain more insights by reusing this well thought RAG Deep dive!

Check it out:

https://techcommunity.microsoft.com/blog/azuredevcommunityblog/rag-deep-dive-watch-all-the-recordings/4383171
Harinath J 265 Reputation points

2025-03-06T08:29:19.2266667+00:00

Thank you @Diego Martos

Share via

Azure OpenAI Chat Completion in Function App Takes 16 Seconds How to Optimize?

0 additional answers

Your answer