Understand autoscale settings

Autoscale settings help ensure that you have the right amount of resources running to handle the fluctuating load of your application. You can configure autoscale settings to be triggered based on metrics that indicate load or performance, or triggered at a scheduled date and time.

This article gives a detailed explanation of the autoscale settings.

Autoscale setting schema

The following example shows an autoscale setting. This autoscale setting has the following attributes:

  • A single default profile.
  • Two metric rules in this profile: one for scale-out, and one for scale-in.
    • The scale-out rule is triggered when the Virtual Machine Scale Set's average percentage CPU metric is greater than 85 percent for the past 10 minutes.
    • The scale-in rule is triggered when the Virtual Machine Scale Set's average is less than 60 percent for the past minute.

Note

A setting can have multiple profiles. To learn more, see the profiles section. A profile can also have multiple scale-out rules and scale-in rules defined. To see how they are evaluated, see the evaluation section.

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "resources": [
        {
            "type": "Microsoft.Insights/autoscaleSettings",
            "apiVersion": "2015-04-01",
            "name": "VMSS1-Autoscale-607",
            "location": "eastus",
            "properties": {

                "name": "VMSS1-Autoscale-607",
                "enabled": true,
                "targetResourceUri": "/subscriptions/abc123456-987-f6e5-d43c-9a8d8e7f6541/resourceGroups/rg-vmss1/providers/Microsoft.Compute/virtualMachineScaleSets/VMSS1",
    "profiles": [
      {
        "name": "Auto created default scale condition",
        "capacity": {
          "minimum": "1",
          "maximum": "4",
          "default": "1"
        },
        "rules": [
          {
            "metricTrigger": {
              "metricName": "Percentage CPU",
              "metricResourceUri": "/subscriptions/abc123456-987-f6e5-d43c-9a8d8e7f6541/resourceGroups/rg-vmss1/providers/Microsoft.Compute/virtualMachineScaleSets/VMSS1",
              "timeGrain": "PT1M",
              "statistic": "Average",
              "timeWindow": "PT10M",
              "timeAggregation": "Average",
              "operator": "GreaterThan",
              "threshold": 85
            },
            "scaleAction": {
              "direction": "Increase",
              "type": "ChangeCount",
              "value": "1",
              "cooldown": "PT5M"
            }
          },
          {
            "metricTrigger": {
              "metricName": "Percentage CPU",
              "metricResourceUri": "/subscriptions/abc123456-987-f6e5-d43c-9a8d8e7f6541/resourceGroups/rg-vmss1/providers/Microsoft.Compute/virtualMachineScaleSets/VMSS1",
              "timeGrain": "PT1M",
              "statistic": "Average",
              "timeWindow": "PT10M",
              "timeAggregation": "Average",
              "operator": "LessThan",
              "threshold": 60
            },
            "scaleAction": {
              "direction": "Decrease",
              "type": "ChangeCount",
              "value": "1",
              "cooldown": "PT5M"
            }
          }
        ]
      }
    ]
  }
}

The table below describes the elements in the above autoscale setting's JSON.

Section Element name Portal name Description
Setting ID The autoscale setting's resource ID. Autoscale settings are an Azure Resource Manager resource.
Setting name The autoscale setting name.
Setting location The location of the autoscale setting. This location can be different from the location of the resource being scaled.
properties targetResourceUri The resource ID of the resource being scaled. You can only have one autoscale setting per resource.
properties profiles Scale condition An autoscale setting is composed of one or more profiles. Each time the autoscale engine runs, it executes one profile.
profiles name The name of the profile. You can choose any name that helps you identify the profile.
profiles capacity.maximum Instance limits - Maximum The maximum capacity allowed. It ensures that autoscale doesn't scale your resource above this number when executing the profile.
profiles capacity.minimum Instance limits - Minimum The minimum capacity allowed. It ensures that autoscale doesn't scale your resource below this number when executing the profile
profiles capacity.default Instance limits - Default If there's a problem reading the resource metric, and the current capacity is below the default, autoscale scales out to the default. This ensures the availability of the resource. If the current capacity is already higher than the default capacity, autoscale doesn't scale in.
profiles rules Rules Autoscale automatically scales between the maximum and minimum capacities, by using the rules in the profile. You can have multiple rules in a profile. Typically there are two rules: one to determine when to scale out, and the other to determine when to scale in.
rule metricTrigger Scale rule Defines the metric condition of the rule.
metricTrigger metricName Metric name The name of the metric.
metricTrigger metricResourceUri The resource ID of the resource that emits the metric. In most cases, it is the same as the resource being scaled. In some cases, it can be different. For example, you can scale a Virtual Machine Scale Set based on the number of messages in a storage queue.
metricTrigger timeGrain Time grain (minutes) The metric sampling duration. For example, TimeGrain = “PT1M” means that the metrics should be aggregated every 1 minute, by using the aggregation method specified in the statistic element.
metricTrigger statistic Time grain statistic The aggregation method within the timeGrain period. For example, statistic = “Average” and timeGrain = “PT1M” means that the metrics should be aggregated every 1 minute, by taking the average. This property dictates how the metric is sampled.
metricTrigger timeWindow Duration The amount of time to look back for metrics. For example, timeWindow = “PT10M” means that every time autoscale runs, it queries metrics for the past 10 minutes. The time window allows your metrics to be normalized, and avoids reacting to transient spikes.
metricTrigger timeAggregation Time aggregation The aggregation method used to aggregate the sampled metrics. For example, TimeAggregation = “Average” should aggregate the sampled metrics by taking the average. In the preceding case, take the ten 1-minute samples, and average them.
rule scaleAction Action The action to take when the metricTrigger of the rule is triggered.
scaleAction direction Operation "Increase" to scale out, or "Decrease" to scale in.
scaleAction value Instance count How much to increase or decrease the capacity of the resource.
scaleAction cooldown Cool down (minutes) The amount of time to wait after a scale operation before scaling again. For example, if cooldown = “PT10M”, autoscale doesn't attempt to scale again for another 10 minutes. The cooldown is to allow the metrics to stabilize after the addition or removal of instances.

Autoscale profiles

There are three types of autoscale profiles:

  • Default profile: Use the default profile if you don’t need to scale your resource based on a particular date and time, or day of the week. The default profile runs when there are no other applicable profiles for the current date and time. You can only have one default profile.
  • Fixed date profile: The fixed date profile is relevant for a single date and time. Use the fixed date profile to set scaling rules for a specific event. The profile runs only once, on the event’s date and time. For all other times, autoscale uses the default profile.
    ...
    "profiles": [
        {
            "name": " regularProfile",
            "capacity": {
                ...
            },
            "rules": [
                ...
            ]
        },
        {
            "name": "eventProfile",
            "capacity": {
            ...
            },
            "rules": [
                ...
            ],
            "fixedDate": {
                "timeZone": "Pacific Standard Time",
                "start": "2017-12-26T00:00:00",
                "end": "2017-12-26T23:59:00"
            }
        }
    ]
  • Recurrence profile: A recurrence profile is used for a day or set of days of the week. The schema for a recurring profile doesn't include an end date. The end of date and time for a recurring profile is set by the start time of the following profile. When using the portal to configure recurring profiles, the default profile is automatically updated to start at the end time that you specify for the recurring profile. For more information on configuring multiple profiles, see Autoscale with multiple profiles

    The partial schema example below shows a recurring profile, starting at 06:00 and ending at 19:00 on Saturdays and Sundays. The default profile has been modified to start at 19:00 on Saturdays and Sundays.

    {
        "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
        "contentVersion": "1.0.0.0",
        "resources": [
            {
                "type": "Microsoft.Insights/    autoscaleSettings",
                "apiVersion": "2015-04-01",
                "name": "VMSS1-Autoscale-607",
                "location": "eastus",
                "properties": {
    
                    "name": "VMSS1-Autoscale-607",
                    "enabled": true,
                    "targetResourceUri": "/subscriptions/    abc123456-987-f6e5-d43c-9a8d8e7f6541/    resourceGroups/rg-vmss1/providers/    Microsoft.Compute/    virtualMachineScaleSets/VMSS1",
                    "profiles": [
                        {
                            "name": "Weekend profile",
                            "capacity": {
                                ...
                            },
                            "rules": [
                                ...
                            ],
                            "recurrence": {
                                "frequency": "Week",
                                "schedule": {
                                    "timeZone": "E. Europe     Standard Time",
                                    "days": [
                                        "Saturday",
                                        "Sunday"
                                    ],
                                    "hours": [
                                        6
                                    ],
                                    "minutes": [
                                        0
                                    ]
                                }
                            }
                        },
                        {
                            "name": "{\"name\":\"Auto created default scale condition\",\"for\":\"Weekend profile\"}",
                            "capacity": {
                               ...
                            },
                            "recurrence": {
                                "frequency": "Week",
                                "schedule": {
                                    "timeZone": "E. Europe     Standard Time",
                                    "days": [
                                        "Saturday",
                                        "Sunday"
                                    ],
                                    "hours": [
                                        19
                                    ],
                                    "minutes": [
                                        0
                                    ]
                                }
                            },
                            "rules": [   
                              ...
                            ]
                        }
                    ],
                    "notifications": [],
                    "targetResourceLocation": "eastus"
                }
    
            }
        ]
            }
    

Autoscale evaluation

Autoscale settings can have multiple profiles. Each profile can have multiple rules. Each time the autoscale job runs, it begins by choosing the applicable profile for that time. Autoscale then evaluates the minimum and maximum values, any metric rules in the profile, and decides if a scale action is necessary. The autoscale job runs every 30 to 60 seconds, depending on the resource type.

Which profile will autoscale use?

Each time the autoscale service runs, the profiles are evaluated in the following order:

  1. Fixed date profiles
  2. Recurring profiles
  3. Default profile

The first suitable profile found will be used.

How does autoscale evaluate multiple rules?

After autoscale determines which profile to run, it evaluates the scale-out rules in the profile, that is, where direction = “Increase”. If one or more scale-out rules are triggered, autoscale calculates the new capacity determined by the scaleAction specified for each of the rules. If more than one scale-out rule is triggered, autoscale scales to the highest specified capacity to ensure service availability.

For example, assume that there are two rules: Rule 1 specifies a scale out by 3 instances, and rule 2 specifies a scale out by 5. If both rules are triggered, autoscale will scale out by 5 instances. Similarly, if one rule specifies scale out by 3 instances and another rule, scale out by 15%, the higher of the two instance counts will be used.

If no scale-out rules are triggered, autoscale evaluates the scale-in rules, that is, rules with direction = “Decrease”. Autoscale only scales in if all of the scale-in rules are triggered.

Autoscale calculates the new capacity determined by the scaleAction of each of those rules. To ensure service availability, autoscale scales in by as little as possible to achieve the maximum capacity specified. For example, assume two scale-in rules, one that decreases capacity by 50 percent, and one that decreases capacity by 3 instances. If first rule results in 5 instances and the second rule results in 7, autoscale scales-in to 7 instances.

Each time autoscale calculates the result of a scale-in action, it evaluates whether that action would trigger a scale-out action. The scenario where a scale action triggers the opposite scale action is known as flapping. Autoscale may defer a scale-in action to avoid flapping or may scale by a number less than what was specified in the rule. For more information on flapping, see Flapping in Autoscale

Next steps

Learn more about autoscale by referring to the following articles :