These are the final responses I got from Microsoft. Apparently, I had different expectations on how a slot would work and a different interpretation of what the docs said.
The operation about not scaling out when the queue length exceeds the limits should be expected. As our official document mentioned, for consumption plan, the slot scales as the function app scales.
It means that even when the staging slot votes to add workers, the backend won’t allocate new works due to the Production Slot will decide on the number of workers.
Reference: https://learn.microsoft.com/en-us/azure/azure-functions/functions-deployment-slots#scaling
And a follow-up:
Hope below explanations are helpful.
- Production and other slots were running as independent instances of the Function App meaning that they would scale independently as if each instance was separate Function App.
Based on our practical experience, the staging and production slots are running on the same app service plan by default. Of course, we can change the app service plan for the staging slots, however, this feature is not supported for Consumption Plan.
In this situation, these slots are sharing the resources of one App Service Plan. For example, if there're 4 instances hosting the production slot, the staging slots will still run on these 4 instances and they will scale together.- “All slots scale to the same number of workers as the production slot.”
Yes, this point has been verified by our product team.
For Consumption plan, the slot scales as the function app scales. For App Service plans, the app scales to a fixed number of workers. Slots run on the same number of workers as the app plan.
I swapped over to the Production slot and the Function App started to scale.