Flex Consumption Plan - Response times long even with always on option

Question

Flex Consumption Plan - Response times long even with always on option

WOERSDOERFER Tobias 25

Dear Microsoft,

we need a function where we have one instance for the customer always ready to test. Here we need a response in under 1 second. Then from time to time we will get ~1000 parallel requests to the function, here start up time is not so important, but that we calculate them in parallel.
The Flex consumption Plan look(ed) perfect for this, but we are running into Problems with the always on instance.

We first tried it with one always on instance but saw that from time to time it scales up to 2 (and have a start up delay) even if we are just sending one request after another with a gap of 10 seconds. Not nice, so we tried it with 2 instances and the scale up in this scenario was less frequent but still there (~1/10 times).

In a real life scenario where we are sending requests directly after another it looks like this with 2 always on instances (ignore that the traces are doubled, thats a bug on our side):
User's image

As you can see the first two requests are fast, then the next 5 requests are slow (new instances are created) and then the rest is a little bit random. As you can see from the timestamp, we are sending the request always one after another, there are never two requests send in parallel.

I would expect that the always on instances are used all the time and we never have a startup time of a new instance in this scenario.

Are the instances always killed after they are used? Even the always on instances?

Is there a way to configure the flex consumption always on instance that it is used all the time when it is available? I feel bad to pay for an always on instance when it looks like it is not always on.

We used consumption plan before and just pinged the function one time and then we were able to have the next ~5min a direct answer. Why is this not possible in flex consumption?

Thank you in advance

0 comments

Answer accepted by question author

TP 158.1K Volunteer Moderator

Hi,

What do you have it set for Instance memory and HTTP Concurrency? You can check this on Scale and concurrency blade of your Function app in portal.

Try switching to manual with a higher number and then repeat your tests. For example, if you have 512 MB try setting concurrency to say, 10, and then test. Four is default for 512, so I'm just making a guess to try it with 10. You need to test to find best number for your workload.

Please click Accept Answer and upvote if the above was helpful.

Thanks.

-TP

WOERSDOERFER Tobias 25 Reputation points

2025-09-16T13:10:56.28+00:00

Hi TP,

we are using a Python function with 2048MB and were using the default settings, so 1 concurrency. We thought that because it is so explicit written in https://learn.microsoft.com/en-us/azure/azure-functions/functions-concurrency that it should be like that for a python function.
We now tested two apps with 2 instances 10 concurrency and 1 instance 5 concurrency and both are working great. We will continue investigating in this field to find the perfect solution for us. Thank you very much!

P.S:
As contrast to the picture in the question:

1 additional answer

Your answer

WOERSDOERFER Tobias 25 Reputation points

2025-09-16T13:10:56.28+00:00

Hi TP,

we are using a Python function with 2048MB and were using the default settings, so 1 concurrency. We thought that because it is so explicit written in https://learn.microsoft.com/en-us/azure/azure-functions/functions-concurrency that it should be like that for a python function.
We now tested two apps with 2 instances 10 concurrency and 1 instance 5 concurrency and both are working great. We will continue investigating in this field to find the perfect solution for us. Thank you very much!

P.S:
As contrast to the picture in the question:

Answer 1

Pashikanti Kumar 1,725 Microsoft External Staff Moderator

Hi WOERSDOERFER Tobias,

Thank you for posting your question in the Microsoft Q&A forum

If the requests come spaced by ~10 seconds but not continuous high load, the system may partially scale-in instances to reduce costs, meaning the next request can trigger a scale-up again.

0 comments

Share via

Flex Consumption Plan - Response times long even with always on option

1 additional answer

Your answer