How do I get Azure Functions to scale up properly with Queue triggers?

Question

I have a Azure Functions App which has several queue triggers along with a number of HTTP triggers, Timer triggers, and Durable Functions.

I am currently running into an issue where there are a very large number of messages in 1 queue. When I started digging into why this might be I noticed that Azure Functions was not scaling up the instances in response to the high number of messages in the queue.

I submitted a support request and was told "When there are several functions, all the functions will vote to remove or add workers based on the workload. For the function app named *****, when there were large number of messages, the queue trigger function voted to add workers, however, the other two functions voted to remove workers due to the idle status. In this situation, the backend didn't add more workers for the increasing queue messages." and "For Consumption plan, adding workers not only depends on the work balance but also is related to other factors."

As as workaround I was told "if this by-design operation has negative effects on your production, we highly recommend to separate the queue triggered function which is for handling the large number of queue messages to another function app. In this way, the backend can allocate more workers based on the queue length."

I find this not to be consistent with what is stated in the docs; "Azure Functions uses a component called the scale controller to monitor the rate of events and determine whether to scale out or scale in. The scale controller uses heuristics for each trigger type. For example, when you're using an Azure Queue storage trigger, it scales based on the queue length and the age of the oldest queue message." (https://learn.microsoft.com/en-us/azure/azure-functions/functions-scale)

While I understand that the scale controller might need to be more "intelligent" than to just consider queue length alone. I find it problematic that a design that is supposed to auto-scale does not handle a situation where only one function out of many needs more resources and yet it won't scale.

The "workaround" option of putting the function with high load into its own Function App las its own drawbacks and limitations and takes what is supposed to be automatic and makes it manual. One of the main selling points of Azure Functions is the auto-scaling.

Please tell me there is a better answer.

Accepted Answer

These are the final responses I got from Microsoft. Apparently, I had different expectations on how a slot would work and a different interpretation of what the docs said.

The operation about not scaling out when the queue length exceeds the limits should be expected. As our official document mentioned, for consumption plan, the slot scales as the function app scales.
It means that even when the staging slot votes to add workers, the backend won’t allocate new works due to the Production Slot will decide on the number of workers.
Reference: https://learn.microsoft.com/en-us/azure/azure-functions/functions-deployment-slots#scaling

And a follow-up:

Hope below explanations are helpful.

Production and other slots were running as independent instances of the Function App meaning that they would scale independently as if each instance was separate Function App.
Based on our practical experience, the staging and production slots are running on the same app service plan by default. Of course, we can change the app service plan for the staging slots, however, this feature is not supported for Consumption Plan.
In this situation, these slots are sharing the resources of one App Service Plan. For example, if there're 4 instances hosting the production slot, the staging slots will still run on these 4 instances and they will scale together.

“All slots scale to the same number of workers as the production slot.”
Yes, this point has been verified by our product team.
For Consumption plan, the slot scales as the function app scales. For App Service plans, the app scales to a fixed number of workers. Slots run on the same number of workers as the app plan.

I swapped over to the Production slot and the Function App started to scale.

Share via

How do I get Azure Functions to scale up properly with Queue triggers?

0 additional answers