Help understanding occasional very slow HTTP requests on Linux Azure Function Consumption Plan

Question

Hi,

TLDR;

I’m seeing unpredictable occasional slow outbound HTTP requests within an invocation on an Azure Functions on a Linux Consumption plan. Runtime v3.0.15417.0. Can replicate in Au-East and US-Central. Problem doesn't occur on Windows Consumption plans.

I'm looking for an explanation why this might be occurring.

Function
My function makes 10 outgoing calls to http://www.google.com. It sleeps for 0.2 seconds between calls. All Http calls are synchronous (I know I can optimise this but initial problem initially came in a Python consumption plan function using CosmosClient which does not support async io).

Calling the function
I have an external script that calls the function 20 times sequentially from my local machine.

Expected Results
I'd expect to see similar execution times for the invocations.

Actual Results
I get wildly unpredictable results. It's extremely easy to reproduce on any Linux Consumption plan.
Windows Consumption plans produce steady more predictable response.

Metrics

75% of the runs are about 4.5 seconds. But often, midway through an invocation, one of the requests might take up to 8 seconds. My hunch is it is being forced to wait to get a socket by the underlying runtime, but I have no way to prove this.

There are no cold starts involved. I can see from my log statements that it may be the 3rd, 4th, 5th, 6th or 7th request within a function invocation that often takes many seconds to respond.

There is no concurrency involved. My test harness calls the function sequentially.

I can recreate similar results using a variety of Uris. I originally saw the problem using CosmosDB.

Any explanations on what might be happening would be great.

Thanks,

Graeme

Answer

Hello anonymous user, We see performance issues for python functions(in case of simultaneous calls as well). So we suggest to maximize the number of FUNCTIONS_WORKER_PROCESS_COUNT.

This behavior is expected due to the single threaded architecture of Python.
In scenarios such as , you are using blocking HTTP sync calls or IO bound calls which will block the entire event loop.

It is documented in our Python Functions Developer reference on how to handle such scenario’s: https://learn.microsoft.com/en-us/azure/azure-functions/functions-reference-python#scaling-and-concurrency . Especially the Async part.

Here are the two methods to handle this:

Async calls
Add more Language worker processes per host, this can be done by using application setting : FUNCTIONS_WORKER_PROCESS_COUNT up to a maximum value of 10. ( So basically, for the CPU-bound workload you are simulating with any loops, we do recommend setting FUNCTIONS_WORKER_PROCESS_COUNT to a higher number to parallelize the work given to a single instance (docs here).
[Please note that each new language worker is spawned every 10 seconds until they are warmed up.]

Here is a GitHub issue which talks about this issue in detail : https://github.com/Azure/azure-functions-python-worker/issues/236

Please let me know if this helps. If it does, please 'Accept as answer' and ‘Up-vote’ so that it can help others in the community looking for help on similar topics.

Help understanding occasional very slow HTTP requests on Linux Azure Function Consumption Plan

1 answer