Share via

Flex Consumption Plan - Response times long even with always on option

WOERSDOERFER Tobias 25 Reputation points
2025-09-16T11:54:04.82+00:00

Dear Microsoft,

we need a function where we have one instance for the customer always ready to test. Here we need a response in under 1 second. Then from time to time we will get ~1000 parallel requests to the function, here start up time is not so important, but that we calculate them in parallel.
The Flex consumption Plan look(ed) perfect for this, but we are running into Problems with the always on instance.

We first tried it with one always on instance but saw that from time to time it scales up to 2 (and have a start up delay) even if we are just sending one request after another with a gap of 10 seconds. Not nice, so we tried it with 2 instances and the scale up in this scenario was less frequent but still there (~1/10 times).

In a real life scenario where we are sending requests directly after another it looks like this with 2 always on instances (ignore that the traces are doubled, thats a bug on our side):
User's image

As you can see the first two requests are fast, then the next 5 requests are slow (new instances are created) and then the rest is a little bit random. As you can see from the timestamp, we are sending the request always one after another, there are never two requests send in parallel.

I would expect that the always on instances are used all the time and we never have a startup time of a new instance in this scenario.

Are the instances always killed after they are used? Even the always on instances?

Is there a way to configure the flex consumption always on instance that it is used all the time when it is available? I feel bad to pay for an always on instance when it looks like it is not always on.

We used consumption plan before and just pinged the function one time and then we were able to have the next ~5min a direct answer. Why is this not possible in flex consumption?

Thank you in advance

Azure Functions
Azure Functions

An Azure service that provides an event-driven serverless compute platform.

0 comments No comments

Answer accepted by question author

TP 158.1K Reputation points Volunteer Moderator
2025-09-16T12:28:53.7766667+00:00

Hi,

What do you have it set for Instance memory and HTTP Concurrency? You can check this on Scale and concurrency blade of your Function app in portal.

Try switching to manual with a higher number and then repeat your tests. For example, if you have 512 MB try setting concurrency to say, 10, and then test. Four is default for 512, so I'm just making a guess to try it with 10. You need to test to find best number for your workload.

Please click Accept Answer and upvote if the above was helpful.

Thanks.

-TP

Was this answer helpful?

1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Pashikanti Kumar 1,725 Reputation points Microsoft External Staff Moderator
    2025-10-07T18:23:06.8133333+00:00

    Hi WOERSDOERFER Tobias,

    Thank you for posting your question in the Microsoft Q&A forum

     

    If the requests come spaced by ~10 seconds but not continuous high load, the system may partially scale-in instances to reduce costs, meaning the next request can trigger a scale-up again.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.