What is the expected response time for Azure Prompt Flow service, and how can I reduce it?

Question

What is the expected response time for Azure Prompt Flow service, and how can I reduce it?

Harinath J 285

Hi Community,

I’m currently using Azure Prompt Flow, and I’ve noticed that my flows are taking around 8 seconds to return a response. I’d like to understand:

What is the typical or expected response time for the Azure Prompt Flow service?

Are there any best practices or optimization strategies to reduce latency?

Is it possible to bring down the response time to around 3 to 4 seconds, or is that below the expected threshold?

Any guidance would be greatly appreciated. Thank you!

Harinath J 285 Reputation points

2025-04-15T03:45:55.2233333+00:00

Thanks lots Gowtham CP, Manas Mohanty.

I need a small clarification. When I write Python tools, I currently place all tool calls in a single Python tool node. For example, I pass the LLM node output to that node and use if conditions like:

if it matches, log as unanswered

elif for out-of-scope

elif to send an email, etc.

Is this the right approach? Or should we write each tool call in a separate Python tool node? What is the best practice for organizing tool calls?
Manas Mohanty 6,115 Reputation points Microsoft External Staff Moderator

2025-04-15T05:25:06.3966667+00:00

Hi Harinath J

I think there won't be much downtime is a few if-else blocks containing few log statements and email sending operation.

We can use asynchronous methods too to have real time experience.

Could you accept the below answer if it helped.

Thank you.

Accepted answer

1 additional answer

Your answer

Harinath J 285 Reputation points

2025-04-15T03:45:55.2233333+00:00

Thanks lots Gowtham CP, Manas Mohanty.

I need a small clarification. When I write Python tools, I currently place all tool calls in a single Python tool node. For example, I pass the LLM node output to that node and use if conditions like:

if it matches, log as unanswered

elif for out-of-scope

elif to send an email, etc.

Is this the right approach? Or should we write each tool call in a separate Python tool node? What is the best practice for organizing tool calls?
Manas Mohanty 6,115 Reputation points Microsoft External Staff Moderator

2025-04-15T05:25:06.3966667+00:00

Hi Harinath J

I think there won't be much downtime is a few if-else blocks containing few log statements and email sending operation.

We can use asynchronous methods too to have real time experience.

Could you accept the below answer if it helped.

Thank you.

Answer 1

Hi Harinath J

Hope the pointers from Gowtham CP on reducing computational overhead from Azure OpenAI helped.

Wanted to add few points on top of it.

Azure OpenAI

We can opt for Global deployment (higher availability compared to standard deployment) or Provision throughput units (dedicated and lower latency) - Deployment types
Structured outputs or explicitly mentioning in system to keep under certain word limit (lesser the output size, faster the response)
Use Asynchronous Azure OpenAI Operation along streaming
Send Simpler queries instead of lengthy complex queries so that you can build context gradually generate answer with minimal time spent.
We can also optimize temperature along max_token (be cautious for hallucination in answer though)
opt for multizone deployment and load balance them to make them resilient over region outages and slowness -

Azure AI search

We also optimize from AI search side by changing chunk size or upgrading
We can also upgrade AI search tier to avail higher availability and index limits
Optimize your indexing operation - incremental indexing or AI enrichment skills to index - Choose optimum indexing operation.
Reduce Top_K and Top_N to show lesser number of results

Python tools

Do normalize the outputs like rounding or reduce redundant computation.
Enable Caching or Purge piling up memory after passing the values to next node.

Storage side

Please use hot tier data for faster data fetching.
Enable CDN to reduce latency on training data.
GRS or GZRS storage provide more resiliency towards disasters and outages.

Overall, we should try to reduce computation overhead while increasing the performance of operation and underlying resources.

Hope it helps.

Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

Thank You.

Harinath J 285 Reputation points

2025-04-15T09:40:41.8233333+00:00

I forgot to ask—I've set the max tokens to 300 because I want short and concise responses. However, my promptflow isn't completing its responses; it stops midway. How can I fix this? It seems like it's generating responses up to the 300-token limit, rather than based on the content. I even added instructions in the prompt, but it's still cutting off responses halfway.
Manas Mohanty 6,115 Reputation points Microsoft External Staff Moderator

2025-04-15T11:20:38.4+00:00

Hi Harinath J

I was able to replicate above issue (answer breaking in middle with max token 300 and user prompt mentioning to answer with in 300 words)

We have to set max_token limit bit higher like 400 or 500

and mention it in user prompt "Please provide a concise answer about it within word count of 300" (Either in UI or code)

For example

messages=[ {"role": "system", "content": "Assistant is a large language model trained by OpenAI."}, {"role": "user", "content": "Who were the founders of Microsoft, Please answer it under 300 words?"} ]

Thank you.

Answer 2

Hey Harinath J ,

Thanks for the question on Microsoft Q&A!

Azure Prompt Flow’s response time depends on your setup—simple flows can take 0.5 to 2 seconds, but complex ones might take 8 seconds or more.

To cut it down,

Try a faster model like GPT-4o mini.
lower max_tokens to what you really need, and
turn on streaming to make it feel snappier.
Also, peek at any custom Python nodes in your flow for slowdowns using compute logs.

Hitting 3-4 seconds is probably doable with some tuning, depending on what’s in your flow. Azure’s docs have more tips on speeding things up.

Hope this works for you! If it does, please upvote and mark it accepted to close the thread. Thanks!

References:

Share via

What is the expected response time for Azure Prompt Flow service, and how can I reduce it?

1 additional answer

Your answer