Events
Mar 17, 11 PM - Mar 21, 11 PM
Join the meetup series to build scalable AI solutions based on real-world use cases with fellow developers and experts.
Register nowThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
This article describes flapping in autoscale and how to avoid it.
Flapping refers to a loop condition that causes a series of opposing scale events. Flapping happens when a scale event triggers the opposite scale event.
Autoscale evaluates a pending scale-in action to see if it would cause flapping. In cases where flapping could occur, autoscale may skip the scale action and reevaluate at the next run, or autoscale may scale by less than the specified number of resource instances. The autoscale evaluation process occurs each time the autoscale engine runs, which is every 30 to 60 seconds, depending on the resource type.
To ensure adequate resources, checking for potential flapping doesn't occur for scale-out events. Autoscale will only defer a scale-in event to avoid flapping.
For example, let's assume the following rules:
In the table below at T0, when usage is at 56%, a scale-out action is triggered and results in 56% CPU usage across 2 instances. That gives an average of 28% for the scale set. As 28% is less than the scale-in threshold, autoscale should scale back in. Scaling in would return the scale set to 56% CPU usage, which triggers a scale-out action.
Time | Instance count | CPU% | CPU% per instance | Scale event | Resulting instance count |
---|---|---|---|---|---|
T0 | 1 | 56% | 56% | Scale out | 2 |
T1 | 2 | 56% | 28% | Scale in | 1 |
T2 | 1 | 56% | 56% | Scale out | 2 |
T3 | 2 | 56% | 28% | Scale in | 1 |
If left uncontrolled, there would be an ongoing series of scale events. However, in this situation, the autoscale engine will defer the scale-in event at T1 and reevaluate during the next autoscale run. The scale-in will only happen once the average CPU usage is below 30%.
Flapping is often caused by:
To avoid flapping, keep adequate margins between scaling thresholds.
For example, the following rules where there's no margin between thresholds, cause flapping.
The table below shows a potential outcome of these autoscale rules:
Time | Instance count | Thread count | Thread count per instance | Scale event | Resulting instance count |
---|---|---|---|---|---|
T0 | 2 | 1250 | 625 | Scale out | 3 |
T1 | 3 | 1250 | 417 | Scale in | 2 |
In this case, it looks like autoscale isn't working since no scale event takes place. Check the Run history tab on the autoscale setting page to see if there's any flapping.
Setting an adequate margin between thresholds avoids the above scenario. For example,
If the scale-in thread count is 400, the total thread count would have to drop to below 1200 before a scale event would take place. See the table below.
Time | Instance count | Thread count | Thread count per instance | Scale event | Resulting instance count |
---|---|---|---|---|---|
T0 | 2 | 1250 | 625 | Scale out | 3 |
T1 | 3 | 1250 | 417 | no scale event | 3 |
T2 | 3 | 1180 | 394 | scale in | 2 |
T3 | 3 | 1180 | 590 | no scale event | 2 |
To avoid flapping when scaling in or out by more than one instance, autoscale may scale by less than the number of instances specified in the rule.
For example, the following rules can cause flapping:
The table below shows a potential outcome of these autoscale rules:
Time | Number of instances | CPU | Request count | Scale event | Resulting instances | Comments |
---|---|---|---|---|---|---|
T0 | 30 | 65% | 3000, or 100 per instance. | No scale event | 30 | |
T1 | 30 | 65 | 1500 | Scale in by 3 instances | 27 | Scaling-in by 10 would cause an estimated CPU rise above 70%, leading to a scale-out event. |
At time T0, the app is running with 30 instances, a total request count of 3000, and a CPU usage of 65% per instance.
At T1, when the request count drops to 1500 requests, or 50 requests per instance, autoscale will try to scale in by 10 instances to 20. However, autoscale estimates that the CPU load for 20 instances will be above 70%, causing a scale-out event.
To avoid flapping, the autoscale engine estimates the CPU usage for instance counts above 20 until it finds an instance count where all metrics are with in the defined thresholds:
In this situation, autoscale may scale in by 3, from 30 to 27 instances in order to satisfy the rules, even though the rule specifies a decrease of 10. A log message is written to the activity log with a description that includes Scale down will occur with updated instance count to avoid flapping
If autoscale can't find a suitable number of instances, it will skip the scale in event and reevaluate during the next cycle.
Note
If the autoscale engine detects that flapping could occur as a result of scaling to the target number of instances, it will also try to scale to a lower number of instances between the current count and the target count. If flapping does not occur within this range, autoscale will continue the scale operation with the new target.
Find flapping in the activity log with the following query:
// Activity log, CategoryValue: Autoscale
// Lists latest Autoscale operations from the activity log, with OperationNameValue =="Microsoft.Insights/AutoscaleSettings/Flapping/Action
AzureActivity
|where CategoryValue =="Autoscale" and OperationNameValue =="Microsoft.Insights/AutoscaleSettings/Flapping/Action"
|sort by TimeGenerated desc
Below is an example of an activity log record for flapping:
{
"eventCategory": "Autoscale",
"eventName": "FlappingOccurred",
"operationId": "1111bbbb-22cc-dddd-ee33-ffffff444444",
"eventProperties":
"{"Description":"Scale down will occur with updated instance count to avoid flapping.
Resource: '/subscriptions/aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e/resourcegroups/ed-rg-001/providers/Microsoft.Web/serverFarms/ScaleableAppServicePlan'.
Current instance count: '6',
Intended new instance count: '1'.
Actual new instance count: '4'",
"ResourceName":"/subscriptions/aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e/resourcegroups/rg-001/providers/Microsoft.Web/serverFarms/ScaleableAppServicePlan",
"OldInstancesCount":6,
"NewInstancesCount":4,
"ActiveAutoscaleProfile":{"Name":"Auto created scale condition",
"Capacity":{"Minimum":"1","Maximum":"30","Default":"1"},
"Rules":[{"MetricTrigger":{"Name":"Requests","Namespace":"microsoft.web/sites","Resource":"/subscriptions/aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e/resourceGroups/rg-001/providers/Microsoft.Web/sites/ScaleableWebApp1","ResourceLocation":"West Central US","TimeGrain":"PT1M","Statistic":"Average","TimeWindow":"PT1M","TimeAggregation":"Maximum","Operator":"GreaterThanOrEqual","Threshold":3.0,"Source":"/subscriptions/aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e/resourceGroups/ed-rg-001/providers/Microsoft.Web/sites/ScaleableWebApp1","MetricType":"MDM","Dimensions":[],"DividePerInstance":true},"ScaleAction":{"Direction":"Increase","Type":"ChangeCount","Value":"10","Cooldown":"PT1M"}},{"MetricTrigger":{"Name":"Requests","Namespace":"microsoft.web/sites","Resource":"/subscriptions/aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e/resourceGroups/rg-001/providers/Microsoft.Web/sites/ScaleableWebApp1","ResourceLocation":"West Central US","TimeGrain":"PT1M","Statistic":"Max","TimeWindow":"PT1M","TimeAggregation":"Maximum","Operator":"LessThan","Threshold":3.0,"Source":"/subscriptions/aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e/resourceGroups/ed-rg-001/providers/Microsoft.Web/sites/ScaleableWebApp1","MetricType":"MDM","Dimensions":[],"DividePerInstance":true},"ScaleAction":{"Direction":"Decrease","Type":"ChangeCount","Value":"5","Cooldown":"PT1M"}}]}}",
"eventDataId": "dddd3333-ee44-5555-66ff-777777aaaaaa",
"eventSubmissionTimestamp": "2022-09-13T07:20:41.1589076Z",
"resource": "scaleableappserviceplan",
"resourceGroup": "RG-001",
"resourceProviderValue": "MICROSOFT.WEB",
"subscriptionId": "aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e",
"activityStatusValue": "Succeeded"
}
To learn more about autoscale, see the following resources:
Events
Mar 17, 11 PM - Mar 21, 11 PM
Join the meetup series to build scalable AI solutions based on real-world use cases with fellow developers and experts.
Register nowTraining
Module
Dynamically meet changing web app performance requirements with autoscale rules - Training
Respond to periods of high activity by incrementally adding resources, and then removing these resources when activity drops, to reduce costs.
Documentation
Autoscale diagnostics - Azure Monitor
This article shows you how to configure diagnostics in autoscale.
Autoscale common metrics - Azure Monitor
Learn which metrics are commonly used for autoscaling your cloud services, virtual machines, and web apps.
Autoscale in Azure using a custom metric - Azure Monitor
Learn how to scale your web app by using custom metrics in the Azure portal.