Does App Service perform connection draining when scaling in?

Question

Say I have an web app deployed in App Service that exposes a REST API for my users, and that certain REST requests are fairly long-running (but not necessarily CPU-intensive; i.e., there might be a period of waiting on some remote resource before they can complete).

I'm trying to figure out how intelligent Azure is going to be when its horizontal auto-scaling logic chooses to "scale in" my App Service. Ideally, I'd hope that it performs some sort of connection draining; i.e., the instance(s) it plans to remove are taken out of the load-balancer rotation so that no new requests get delivered to them, and then they are given some time to finish servicing any in-flight requests.

Is this a realistic expectation? Try as I might, I can't seem to find any information about this in the Azure documentation, nor have I managed to find any external confirmation (or even discussion!) of how Azure actually handles scale-in for App Services.

So my question is this: can I expect "nice" behavior here where connection draining occurs (and can I even, say, configure how long Azure should give an instance to finish servicing existing requests before it kills the instance)? Or does Azure just immediately kill App Service instances once it's decided that a scale-in is necessary?

Accepted Answer

Hi @Brenden K ,

App Service scale-in logic will observe the rules configured over a 5-minute sliding window duration. If the scale in rule(s) is not triggered within that 5-minute window, the instance will remain. However, if they have been triggered, those instances will begin shutdown.

I don't know if there are other custom metrics you can observe on your Web API that relates to the work being done on your dependent resources but utilize the best practices guide. If you find that none of the rules fit your workflow, you may want to consider applying pipes and filter pattern to your application design. Using Azure Functions may also be a more a suitable option that allows you to use durable functions to maintain stateful data.

EDIT: App Service will follow IIS default timeouts. If a scale-in operation is requested, the instance being spun down will process any "in flight" requests for 90 seconds before being terminated. During this time, any new incoming requests won't be sent to that server.

Regards,
Ryan

Does App Service perform connection draining when scaling in?

0 additional answers