Autoscaling and App Service Environment v1
Important
This article is about App Service Environment v1. App Service Environment v1 and v2 are retired as of 31 August 2024. There's a new version of App Service Environment that is easier to use and runs on more powerful infrastructure. To learn more about the new version, start with the Introduction to the App Service Environment. If you're currently using App Service Environment v1, please follow the steps in this article to migrate to the new version.
As of 31 August 2024, Service Level Agreement (SLA) and Service Credits no longer apply for App Service Environment v1 and v2 workloads that continue to be in production since they are retired products. Decommissioning of the App Service Environment v1 and v2 hardware has begun, and this may affect the availability and performance of your apps and data.
You must complete migration to App Service Environment v3 immediately or your apps and resources may be deleted. We will attempt to auto-migrate any remaining App Service Environment v1 and v2 on a best-effort basis using the in-place migration feature, but Microsoft makes no claim or guarantees about application availability after auto-migration. You may need to perform manual configuration to complete the migration and to optimize your App Service plan SKU choice to meet your needs. If auto-migration isn't feasible, your resources and associated app data will be deleted. We strongly urge you to act now to avoid either of these extreme scenarios.
If you need additional time, we can offer a one-time 30-day grace period for you to complete your migration. For more information and to request this grace period, review the grace period overview, and then go to Azure portal and visit the Migration blade for each of your App Service Environments.
For the most up-to-date information on the App Service Environment v1/v2 retirement, see the App Service Environment v1 and v2 retirement update.
Azure App Service environments support autoscaling. You can autoscale individual worker pools based on metrics or schedule.
Autoscaling optimizes your resource utilization by automatically growing and shrinking an App Service environment to fit your budget and or load profile.
Configure worker pool autoscale
You can access the autoscale functionality from the Settings tab of the worker pool.
From there, the interface should be fairly familiar since it is the same experience that you see when you scale an App Service plan.
You can also configure an autoscale profile.
Autoscale profiles are useful to set limits on your scale. This way, you can have a consistent performance experience by setting a lower bound scale value (1) and a predictable spend cap by setting an upper bound (2).
After you define a profile, you can add autoscale rules to scale up or down the number of instances in the worker pool within the bounds defined by the profile. Autoscale rules are based on metrics.
Any worker pool or front-end metrics can be used to define autoscale rules. These metrics are the same metrics you can monitor in the resource blade graphs or set alerts for.
Autoscale example
Autoscale of an App Service environment can best be illustrated by walking through a scenario.
This article explains all the necessary considerations when you set up autoscale. The article walks you through the interactions that come into play when you factor in autoscaling App Service environments that are hosted in App Service Environment.
Scenario introduction
Frank is a sysadmin for an enterprise who has migrated a portion of the workloads that they manage to an App Service environment.
The App Service environment is configured to manual scale as follows:
- Front ends: 3
- Worker pool 1: 10
- Worker pool 2: 5
- Worker pool 3: 5
Worker pool 1 is used for production workloads, while worker pool 2 and worker pool 3 are used for quality assurance (QA) and development workloads.
The App Service plans for QA and dev are configured to manual scale. The production App Service plan is set to autoscale to deal with variations in load and traffic.
Frank is very familiar with the application. They know that the peak hours for load are between 9:00 AM and 6:00 PM because this is a line-of-business (LOB) application that employees use while they are in the office. Usage drops after that when users are done for that day. Outside peak hours, there is still some load because users can access the app remotely by using their mobile devices or home PCs. The production App Service plan is already configured to autoscale based on CPU usage with the following rules:
Autoscale profile – Weekdays – App Service plan | Autoscale profile – Weekends – App Service plan |
---|---|
Name: Weekday profile | Name: Weekend profile |
Scale by: Schedule and performance rules | Scale by: Schedule and performance rules |
Profile: Weekdays | Profile: Weekend |
Type: Recurrence | Type: Recurrence |
Target range: 5 to 20 instances | Target range: 3 to 10 instances |
Days: Monday, Tuesday, Wednesday, Thursday, Friday | Days: Saturday, Sunday |
Start time: 9:00 AM | Start time: 9:00 AM |
Time zone: UTC-08 | Time zone: UTC-08 |
Autoscale rule (Scale Up) | Autoscale rule (Scale Up) |
Resource: Production (App Service Environment) | Resource: Production (App Service Environment) |
Metric: CPU % | Metric: CPU % |
Operation: Greater than 60% | Operation: Greater than 80% |
Duration: 5 Minutes | Duration: 10 Minutes |
Time aggregation: Average | Time aggregation: Average |
Action: Increase count by 2 | Action: Increase count by 1 |
Cool down (minutes): 15 | Cool down (minutes): 20 |
Autoscale rule (Scale Down) | Autoscale rule (Scale Down) |
Resource: Production (App Service Environment) | Resource: Production (App Service Environment) |
Metric: CPU % | Metric: CPU % |
Operation: Less than 30% | Operation: Less than 20% |
Duration: 10 minutes | Duration: 15 minutes |
Time aggregation: Average | Time aggregation: Average |
Action: Decrease count by 1 | Action: Decrease count by 1 |
Cool down (minutes): 20 | Cool down (minutes): 10 |
App Service plan inflation rate
App Service plans that are configured to autoscale do so at a maximum rate per hour. This rate can be calculated based on the values provided on the autoscale rule.
Understanding and calculating the App Service plan inflation rate is important for App Service environment autoscale because scale changes to a worker pool are not instantaneous.
The App Service plan inflation rate is calculated as follows:
Based on the Autoscale – Scale Up rule for the Weekday profile of the production App Service plan:
In the case of the Autoscale – Scale Up rule for the Weekend profile of the production App Service plan, the formula would resolve to:
This value can also be calculated for scale-down operations.
Based on the Autoscale – Scale Down rule for the Weekday profile of the production App Service plan, this would look as follows:
In the case of the Autoscale – Scale Down rule for the Weekend profile of the production App Service plan, the formula would resolve to:
The production App Service plan can grow at a maximum rate of eight instances/hour during the week and four instances/hour during the weekend. It can release instances at a maximum rate of four instances/hour during the week and six instances/hour during weekends.
If multiple App Service plans are being hosted in a worker pool, you have to calculate the total inflation rate as the sum of the inflation rate for all the App Service plans that are being hosting in that worker pool.
Use the App Service plan inflation rate to define worker pool autoscale rules
Worker pools that host App Service plans that are configured to autoscale need to be allocated a buffer of capacity. The buffer allows for the autoscale operations to grow and shrink the App Service plan as needed. The minimum buffer would be the calculated Total App Service Plan Inflation Rate.
Because App Service environment scale operations take some time to apply, any change should account for further demand changes that could happen while a scale operation is in progress. To accommodate this latency, we recommend that you use the calculated Total App Service Plan Inflation Rate as the minimum number of instances that are added for each autoscale operation.
With this information, Frank can define the following autoscale profile and rules:
Autoscale profile – Weekdays | Autoscale profile – Weekends |
---|---|
Name: Weekday profile | Name: Weekend profile |
Scale by: Schedule and performance rules | Scale by: Schedule and performance rules |
Profile: Weekdays | Profile: Weekend |
Type: Recurrence | Type: Recurrence |
Target range: 13 to 25 instances | Target range: 6 to 15 instances |
Days: Monday, Tuesday, Wednesday, Thursday, Friday | Days: Saturday, Sunday |
Start time: 7:00 AM | Start time: 9:00 AM |
Time zone: UTC-08 | Time zone: UTC-08 |
Autoscale rule (Scale Up) | Autoscale rule (Scale Up) |
Resource: Worker pool 1 | Resource: Worker pool 1 |
Metric: WorkersAvailable | Metric: WorkersAvailable |
Operation: Less than 8 | Operation: Less than 3 |
Duration: 20 minutes | Duration: 30 minutes |
Time aggregation: Average | Time aggregation: Average |
Action: Increase count by 8 | Action: Increase count by 3 |
Cool down (minutes): 180 | Cool down (minutes): 180 |
Autoscale rule (Scale Down) | Autoscale rule (Scale Down) |
Resource: Worker pool 1 | Resource: Worker pool 1 |
Metric: WorkersAvailable | Metric: WorkersAvailable |
Operation: Greater than 8 | Operation: Greater than 3 |
Duration: 20 minutes | Duration: 15 minutes |
Time aggregation: Average | Time aggregation: Average |
Action: Decrease count by 2 | Action: Decrease count by 3 |
Cool down (minutes): 120 | Cool down (minutes): 120 |
The Target range defined in the profile is calculated by the minimum instances defined in the profile for the App Service plan + buffer.
The Maximum range would be the sum of all the maximum ranges for all App Service plans hosted in the worker pool.
The Increase count for the scale up rules should be set to at least 1X the App Service Plan Inflation Rate for scale up.
Decrease count can be adjusted to something between 1/2X or 1X the App Service Plan Inflation Rate for scale down.
Autoscale for front-end pool
Rules for front-end autoscale are simpler than for worker pools. Primarily, you should
make sure that duration of the measurement and the cooldown timers consider that scale
operations on an App Service plan are not instantaneous.
For this scenario, Frank knows that the error rate increases after front ends reach 80% CPU utilization and sets the autoscale rule to increase instances as follows:
Autoscale profile – Front ends |
---|
Name: Autoscale – Front ends |
Scale by: Schedule and performance rules |
Profile: Everyday |
Type: Recurrence |
Target range: 3 to 10 instances |
Days: Everyday |
Start time: 9:00 AM |
Time zone: UTC-08 |
Autoscale rule (Scale Up) |
Resource: Front-end pool |
Metric: CPU % |
Operation: Greater than 60% |
Duration: 20 minutes |
Time aggregation: Average |
Action: Increase count by 3 |
Cool down (minutes): 120 |
Autoscale rule (Scale Down) |
Resource: Worker pool 1 |
Metric: CPU % |
Operation: Less than 30% |
Duration: 20 Minutes |
Time aggregation: Average |
Action: Decrease count by 3 |
Cool down (minutes): 120 |