Scaling Application Gateway v2 and WAF v2

Application Gateway and WAF can be configured to scale in two modes:

  • Autoscaling - With autoscaling enabled, the Application Gateway and WAF v2 SKUs scale out or in based on application traffic requirements. This mode offers better elasticity to your application and eliminates the need to guess the application gateway size or instance count. This mode also allows you to save cost by not requiring the gateway to run at peak-provisioned capacity for expected maximum traffic load. You must specify a minimum and optionally maximum instance count. Minimum capacity ensures that Application Gateway and WAF v2 don't fall below the minimum instance count specified, even without traffic. Each instance is roughly equivalent to 10 more reserved Capacity Units. Zero signifies no reserved capacity and is purely autoscaling in nature. You can also optionally specify a maximum instance count, which ensures that the Application Gateway doesn't scale beyond the specified number of instances. You are only billed for the amount of traffic served by the Gateway. The instance counts can range from 0 to 125. The default value for maximum instance count is 10 if not specified.


If the maximum instance count is updated to a value less than the current instance count, the new setting will not take immediate effect. The newly updated maximum will only be enforced after a scale-in operation brings the current count below newly updated maximum count. If the scale-in operation does not occur because the autoscaling scale in thresholds are not met, the new maximum setting will not be applied.

  • Manual - You can also choose Manual mode where the gateway doesn't autoscale. In this mode, if there's more traffic than what Application Gateway or WAF can handle, it could result in traffic loss. With manual mode, specifying instance count is mandatory. Instance count can vary from 1 to 125 instances.


These scaling modes don’t apply for Application Gateway Basic. Application Gateway Basic automatically scales up to an estimated 200 connections per second, based on an RSA 2048-bit key TLS certificate.

Autoscaling and High Availability

Azure Application Gateways are always deployed in a highly available fashion. The service is made up of multiple instances that are created as configured if autoscaling is disabled, or required by the application load if autoscaling is enabled. From the user's perspective, you don't necessarily have visibility into the individual instances, but just into the Application Gateway service as a whole. If a certain instance has a problem and stops being functional, Azure Application Gateway transparently creates a new instance.

Even if you configure autoscaling with zero minimum instances the service is still highly available, which is always included with the fixed price.

However, creating a new instance can take around six or seven minutes. If you don't want to have this downtime, you can configure a minimum instance count of two, ideally with Availability Zone support. This way you have at least two instances in your Azure Application Gateway under normal circumstances. So if one of them had a problem the other tries to handle the traffic while a new instance is being created. An Azure Application Gateway instance can support around 10 Capacity Units. Depending on how much traffic you typically have, you might want to configure your minimum instance autoscaling setting to a value higher than two.

For scale-in events, Application Gateway drains existing connections for 5 minutes on the instance that is subject for removal. After 5 minutes, existing connections are closed and the instance removed. Any new connections during or after the 5 minute scale-in time is established to other existing instances on the same gateway.

Next steps