Autoscaling Guidance
Constantly monitoring performance and scaling a system to adapt to fluctuating workloads to meet capacity targets and optimize operational cost can be a labor-intensive process. It may not be feasible to perform these tasks manually. This is where autoscaling is useful.
What is Autoscaling?
Autoscaling is the process of dynamically allocating the resources required by an application to match performance requirements and satisfy service level agreements (SLAs). As the volume of work grows, an application may require additional resources to enable it to perform its tasks in a timely manner.
Autoscaling is often an automated process that can help to ease management overhead by reducing the need for an operator to continually monitor the performance of a system and make decisions about adding or removing resources.
Autoscaling should also be an elastic process; more resources can be provisioned as the load increases on the system, but as demand slackens resources can be de-allocated to minimize costs while still maintaining adequate performance and meeting SLAs.
Note
Autoscaling applies to all of the resources used by an application, not just the compute resources. For example, if your system uses message queues to send and receive information, it could create additional queues as it scales.
Types of Scaling
Scaling typically takes one of two forms—vertical and horizontal scaling:
- Vertical Scaling (often referred to as scaling up) requires that you redeploy the solution using different hardware. In a cloud environment the hardware platform is typically a virtualized environment, and vertical scaling involves provisioning more powerful resources for this environment and moving the system onto these new resources. Vertical scaling is often a disruptive process that requires making the system temporarily unavailable while it is being redeployed. It may be possible to keep the original system running while the new hardware is provisioned and brought online, but there will likely be some interruption while the processing transitions from the old environment to the new one. It is uncommon to use autoscaling to implement a vertical scaling strategy.
- Horizontal Scaling (often referred to as scaling out) requires deploying the system on additional resources. The system can continue running without interruption while these resources are provisioned. When the provisioning process is complete, copies of the elements that comprise the system can be deployed on these additional resources and made available. If demand drops, the additional resources can be reclaimed after the elements using them have been shut down cleanly. Many cloud-based systems, including Microsoft Azure, support this form of autoscaling.
Implementing an Autoscaling Strategy
Implementing an autoscaling strategy typically involves the following processes and components:
- Instrumentation at the application level to capture key performance and scaling factors such as response times, queue lengths, CPU utilization, and memory usage.
- Monitoring components that can observe these performance and scaling factors.
- Decision-making logic that can evaluate the monitored scaling factors against predefined system thresholds and make decisions regarding whether to scale or not. Time plays a critical factor in these evaluations. The decision making logic should avoid making scaling decisions too frequently as this can cause the system to oscillate. It may be possible to semi-automate the scaling decision with the final decision left to an operator.
- Execution components that are responsible for carrying out tasks associated with scaling the system. These components typically use tools and scripts to perform the following tasks:
- Provision or de-provision resources.
- Reconfigure the system.
- Testing and validation of the autoscaling strategy to ensure that it functions as expected.
Traditionally, many autoscaling solutions for the cloud depended on writing and configuring scripts that gathered the appropriate performance data, analyzed this data, and then added or removed resources as appropriate. It is now becoming increasingly common for cloud-based systems to provide built-in tooling to help reduce the time and effort required to implement autoscaling.
However, it is important to implement an autoscaling strategy based on the specific requirements of the application rather than being driven by the features provided by any specific toolset. Scripting is still an essential skill, and a good autoscaling solution combines the features provided by the selected toolset with customizations in the form of scripts.
Note
If you are using Azure, you can access the Azure Management API through Windows PowerShell to script many tasks associated with starting and stopping instances and provisioning services.
Considerations for Implementing Autoscaling
Autoscaling is not an instant solution. Simply adding resources to a system or running more instances of a process does not guarantee that the performance of the system will improve. Consider the following points when designing an autoscaling strategy:
- The system must be designed to be horizontally scalable. Avoid making assumptions about instance affinity; do not design solutions that require that the code is always running in a specific instance of a process. When scaling a cloud service or web site horizontally, do not assume that a series of requests from the same source will always be routed to the same instance. For the same reason, design services to be stateless to avoid requiring that a series of requests from an application are always routed to the same instance of a service. When designing a service that reads messages from a queue and processes them, do not make any assumptions about which instance of the service handles a specific message because autoscaling could start additional instances of a service as the queue length grows. The Competing Consumers pattern describes how to handle this scenario.
- If the solution implements a long-running task, design this task to support both scaling out and scaling in. Without due care, such a task could prevent an instance of a process from being shutdown cleanly when the system scales in, or it could lose data if the process is forcibly terminated. Ideally, refactor a long-running task and break up the processing that it performs into smaller, discrete chunks. The Pipes and Filters pattern provides an example of how you can achieve this. Alternatively, you can implement a checkpoint mechanism that records state information about the task at regular intervals, and save this state in durable storage that can be accessed by any instance of the process running the task. In this way, if the process is shutdown, the work that it was performing can be resumed from the last checkpoint by using another instance.
- If the solution comprises multiple items, such as web roles, worker roles, and other resources, it might be necessary to scale all of these items as a unit. It is important to understand the relationships between the items that comprise a solution, and identify groupings that should be scaled together (as a scale unit) to achieve a given performance metric. For example, if you know that to handle 10,000 more active users you need to add two more instances of a given web role, three more instances of a particular worker role, and add an additional Service Bus queue, then this is your scalability unit. Obtaining this knowledge takes time and requires careful analysis of telemetry data.
- To prevent a system from attempting to scale out excessively (and to prevent the costs associated with running many thousands of instances), consider limiting the degree of autoscaling. Consider gracefully degrading the functionality that the system provides if the required resources are currently overloaded. Keep in mind that autoscaling might not be the most appropriate mechanism to handle a sudden burst in workload. It takes time to provision and start new instances of a service or add resources to a system, and the peak may have passed by the time these additional resources have been made available. In this scenario, it may be better to throttle the service. For more information, see the Throttling pattern.
- The system should be configured to monitor the autoscaling process, and log the details of each autoscaling event (what triggered it, what resources were added or removed, and when). This information can be analyzed to help measure the effectiveness of the autoscaling strategy, and tune it if necessary. If the system hits the upper limit defined for autoscaling, it might also alert an operator. The operator could examine the system and may be able to manually start additional resources if the situation warrants them. Note that, under these circumstances, the operator may also be responsible for manually removing these resources after the workload eases.
Autoscaling in a Azure Solution
Azure provides several options for configuring autoscaling for your solutions:
- Azure Autoscaling. This feature supports the most common scaling scenarios, and you can configure a solution by using the Azure Management Portal.
- Microsoft Enterprise Library Autoscaling Application Block. This utility enables you to scale a solution based on custom rules and performance data. This approach is more flexible, but more complex, and requires you to write code to capture performance data that is specific to your solutions.
- Azure Monitoring Services Management Library. This library provides access to Azure Monitoring Services operations, including a unified API for retrieving, and configuring metrics, alerts, and autoscale rules for Azure services.
The following sections summarize these approaches.
Using Azure Autoscaling
Azure Autoscaling enables you to configure scale out and scale in options for a solution. Using this feature you can automatically add and remove instances of Azure Cloud Services web and worker roles, Azure Websites applications, and Azure Virtual Machines. There are two approaches for configuring autoscaling in Azure:
- Configure autoscaling based on metrics such as average CPU utilization over the last hour, or the backlog of items in a message queue that the solution is processing. You configure the parameters used by Azure Autoscaling, monitor the performance of your system, and then adjust the way in which the system scales if necessary. However, keep in mind that autoscaling is not an instantaneous process—it takes time to react to a metric such as average CPU utilization exceeding (or dropping below) a specified threshold. Avoid setting finely balanced thresholds that could attempt to start and stop instances very frequently; Azure enforces this rule by permitting only one scaling action to occur in a five minute period. You can increase this period if you find that the system is still overreacting.
- Configure time-based autoscaling to ensure that additional instances are available to coincide with an expected peak in usage, and scale in once the peak time has passed. This strategy enables you to ensure that you have sufficient instances already running without waiting for the system to react to the load.
You should also consider scaling other resources linked to a compute instances as part of the same scalability unit. For example, you could resize SQL databases or add storage accounts as the system scales. However, at the time of writing, you must either perform these operations manually or use the Microsoft Enterprise Library Autoscaling Application Block.
Note
For more information about configuring autoscaling by using the Microsoft Azure Management Portal, see How to Scale an Application on MSDN.
Implementing Custom Autoscaling by Using the Microsoft Enterprise Library Autoscaling Application Block
The Microsoft Enterprise Library Autoscaling Application Block provides a highly customizable approach to scalability, enabling you to make scaling decisions based on performance counters or other custom metrics.
You specify rules that determine how the Autoscaling Application Block reacts to the metrics. These rules can be complex, and may reference combinations of metrics. For example, you could specify that the Autoscaling Application Block should start an additional instance of a worker role if the length of a message queue is growing at a certain speed and the role has less than 10% of available memory.
As with Azure Autoscaling, the Autoscaling Application Block also supports time-based scaling, and you can restrict the degree of autoscaling that can occur to help prevent excessive costs.
Note
The Autoscaling Application Block page on MSDN provides detailed information on configuring autoscaling, defining rules, and gathering performance data.
Implementing Custom Autoscaling by Using Azure Monitoring Service Library
The Azure Monitoring Service Library, which is in preview at the time of writing, can be used to monitor and automatically scale Azure deployments. In addition to defining autoscaling rules, this library provides options for monitoring and alerting. You can download the library from the NuGet gallery.
Related Patterns and Guidance
The following patterns and guidance may also be relevant to your scenario when implementing autoscaling:
- Throttling Pattern. This pattern describes how an application can continue to function and meet service level agreements when an increase in demand places an extreme load on resources. Throttling can be used with autoscaling to prevent a system from being overwhelmed while the system scales out.
- Competing Consumers Pattern. This pattern describes how to implement a pool of service instances that can handle messages from any application instance. Autoscaling can be used to start and stop service instances to match the anticipated workload. This approach enables a system to process multiple messages concurrently to optimize throughput, improve scalability and availability, and balance the workload.
- Instrumentation and Telemetry Guidance. Instrumentation and telemetry are vital for gathering the information that can drive the autoscaling process.
More Information
- The page How to Scale an Application on MSDN.
- The Microsoft Enterprise Library Autoscaling Application Block documentation and key scenarios on MSDN.