Thanks for asking question! There are best practices for Autoscale and it’s recommended to choose carefully different thresholds for scale-out and scale-in based on practical situations.
Estimation during a scale-in is intended to avoid "flapping" situations, where scale-in and scale-out actions continually go back and forth. Keep this behavior in mind when you choose the same thresholds for scale-out and in.
Its recommended choosing an adequate margin between the scale-out and in thresholds.
As an example, consider the following better rule combination.
• Increase instances by 1 count when CPU% >= 80
• Decrease instances by 1 count when CPU% <= 60
In this case
- Assume there are 2 instances to start with.
- If the average CPU% across instances goes to 80, autoscale scales out adding a third instance.
- Now assume that over time the CPU% falls to 60.
- Autoscale's scale-in rule estimates the final state if it were to scale-in.
For example, 60 x 3 (current instance count) = 180 / 2 (final number of instances when scaled down) = 90. So autoscale does not scale-in because it would have to scale-out again immediately. Instead, it skips scaling down.
5.The next time autoscale checks, the CPU continues to fall to 50. It estimates again - 50 x 3 instance = 150 / 2 instances = 75, which is below the scale-out threshold of 80, so it scales in successfully to 2 instances.
Note: If the autoscale engine detects flapping could occur as a result of scaling to the target number of instances, it will also try to scale to a different number of instances between the current count and the target count. If flapping does not occur within this range, autoscale will continue the scale operation with the new target.
Hope this helps. Let us know if further query or issue remains.