Share via

Load-Balanced Cluster

Retired Content

This content is outdated and is no longer being maintained. It is provided as a courtesy for individuals who are still using these technologies. This page may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. Please see the patterns & practices guidance for the most current information.


Version 1.0.1

GotDotNet community for collaboration on this pattern

Complete List of patterns & practices


You have decided to use clustering in designing or modifying an infrastructure tier to maintain performance requirements while supporting the ability to adapt to changing demands.


How should you design a scalable infrastructure tier that accounts for changes in load while maintaining an acceptable level of performance?


When designing your scalable infrastructure tier, consider the following forces:

Individual servers have a maximum amount of load capacity for any given application. For example, if a single server provides Web pages as part of a Web-based application and the user or transaction load increases beyond the limitation of the server, the application will either fall below performance expectations or, in the worst case, become unavailable.

Individual servers have maximum physical performance limitations, including limitations to the bus speed, the amount of memory, the number of processors, and the number of peripherals that any one server can use. For example, if the server is capable of housing only four processors, you cannot add a fifth processor to enhance performance.

Certain applications have limitations on the number of CPUs that they can use.

Servers, as individual entities, are single points of failure within a solution. If only one server is responsible for delivering the functionality of a component within an application, its failure results in an application failure.

Adding servers can increase the complexity of managing and monitoring the server hardware and its associated software.


Install your service or application onto multiple servers that are configured to share the workload. This type of configuration is a load-balanced cluster. Load balancing scales the performance of server-based programs, such as a Web server, by distributing client requests across multiple servers. Load balancing technologies, commonly referred to as load balancers, receive incoming requests and redirect them to a specific host if necessary. The load-balanced hosts concurrently respond to different client requests, even multiple requests from the same client. For example, a Web browser may obtain the multiple images within a single Web page from different hosts in the cluster. This distributes the load, speeds up processing, and shortens the response time to clients.

Load balancers use different algorithms to control traffic. The goal of these algorithms is to intelligently distribute load and/or maximize the utilization of all servers within the cluster. Some examples of these algorithms include:

Round-robin. A round-robin algorithm distributes the load equally to each server, regardless of the current number of connections or the response time. Round-robin is suitable when the servers in the cluster have equal processing capabilities; otherwise, some servers may receive more requests than they can process while others are using only part of their resources.

Weighted round-robin. A weighted round-robin algorithm accounts for the different processing capabilities of each server. Administrators manually assign a performance weight to each server, and a scheduling sequence is automatically generated according to the server weight. Requests are then directed to the different servers according to a round-robin scheduling sequence.

Least-connection. A least-connection algorithm sends requests to servers in a cluster, based on which server is currently serving the fewest connections.

Load-based. A load-based algorithm sends requests to servers in a cluster, based on which server currently has the lowest load.

Additionally, some load balancers incorporate failure detection. The balancer keeps track of the server or the application running on the server and stops sending requests to a server after a failure. Figure 1 shows the basic components of load balancing.


Figure 1: Load balancing components

When the load balancer receives a request from the client, one of the servers in the group processes the request. Every server is capable of handling the request independently. If any server is unavailable due to error or maintenance, other servers can still serve requests without being affected. Thus, the overall availability of the service is much higher than if a single server were serving all the requests. Using a single physical load balancer or a single network switch in front of a set of software load-balanced servers introduces another single point failure, however. You can use redundant load balancing devices and/or switches to mitigate this risk.

Session State Management

Applications often require user interaction among the individual steps in a complete use case. Each response the user makes during the interaction affects the choices available to the user and the state of the application as it progresses toward the user's goal. The term session state is often used to describe this use-case-focused state. A portion of this session state is needed only to track progress through the task and is discarded when the use case is complete; other parts of the session state are saved in long-term storage in the database if the use case concludes successfully. For example, a customer using an online shopping cart is rarely asked for payment or shipping information until he or she has selected a checkout button, which is not enabled until there is at least one item in the shopping cart.

Distributed applications typically call software components on remote servers over a network connection. The application must track the changes in session state that occur between the individual steps to provide continuity between them. Application designers typically maintain session state in one of three basic places:

Client. Application designers store each user's session state on the user's computer.

Intermediate server. Application designers store session state on a computer that serves as an intermediary between client computers and the database servers on which the user's information is permanently stored.

Database server. Application designers store session state in the database server along with other long-term application and user data.

Only the intermediate server approach affects this pattern. Each approach and its advantages and disadvantages are described in detail in Chapter 2, "Designing for Scalability," of Designing for Scalability with Microsoft Windows DNA [Sundblad00].

A simple solution such as the one shown in Figure 1 is good enough when all the servers are stateless; that is, after a server serves a request, the state of the server is restored to the default value. There are two scenarios in which the server can be stateless. In one, the client does not need a session; that is, each request is a single unit of work, and no temporary values persist between requests. In the other scenario, known as client session management, the client itself keeps the state of a session and sends the session state information within the request so that any server can pick up the request and keep processing it.

In server session management scenarios, the server maintains the state of a user session. Server session management requires the load balancer to direct all requests from one client within the same user session to the same server instance. This mechanism is often called server affinity.

One inherent concern to session management is that if the server goes offline due to error or maintenance, the client's work could be lost and the client would have to resend all the previous requests from the lost session. In some cases, occasional session loss is not a major problem for the user. For example, in an online map search application, if the server loses an address that the user has just typed, it's not too much trouble for the user to retype the address. In other cases, however, session loss could be extremely inconvenient. For example, in an online leasing application with a stateless client, it may take the user 10 minutes to type several pages worth of information into a contract form. You certainly do not want the user spend another 10 minutes retyping all of the information if one of the servers in the load balancing group goes offline. To avoid session loss due to server failure in a load balancing group, there are two approaches: centralized state management and asynchronous session state management. Figure 2 shows centralized state management.


Figure 2: Load balancing and centralized state management

The centralized state management approach stores the session state information on a centralized server in a different tier from the application servers. Each time the application server receives a request that is part of a session, it fetches the session state from the session management server before processing the request. The session management service can be a database or another type of application that runs on a server that stores shared resources and is configured for high reliability. For more information about how to improve fault-tolerance on shared resources, see the Failover Cluster pattern.

Figure 3 shows asynchronous session state management.


Figure 3: Load balancing and asynchronous session state management

Using the asynchronous session state management approach, every server broadcasts its session state to all other servers whenever the session state is changed; therefore, every server contains the state information for all sessions, and any server can process a request that is part of a session. Session state also survives individual server failures. This solution is cheaper because no extra equipment is required but harder to configure and maintain because it involves asynchronous calls. Storing the state for all sessions on every server can also be less efficient.


The two major categories of load balancing implementations are:

Software-based load balancing. Software-based load balancing consists of special software that is installed on the servers in a load-balanced cluster. The software dispatches or accepts requests from the client to the servers, based on different algorithms. The algorithms can be a simple round-robin algorithm or a much more complicated algorithm that considers server affinity. For example, Microsoft Network Load Balancing is a load balancing software for Web farms, and Microsoft Component Load Balancing is a load balancing software for application farms.

Hardware-based load balancing. Hardware-based load balancing consists of a specialized switch or router with software to give it load balancing functionality. This solution integrates switching and load balancing into a single device, which reduces the amount of extra hardware that is required to implement load balancing. Combining the two functions, however, also makes the device more difficult to troubleshoot.


To help you better understand how to use load balancing to achieve scalability, the following discussion compares an existing non-load-balanced solution, which contains a single system (single point of failure) in the application tier, to a highly scalable solution that maintains performance and increases availability.

Non-Load-Balanced Tier

Initially, an organization might start with a solution architecture such as the one outlined in Figure 4, which might meet initial performance expectations. As the load increases, however, the application tier must adapt to the increased load to maintain acceptable performance.


Figure 4: Basic solution with a single application server

In Figure 4, the application tier contains only one application server (AppServer20), which serves client requests. If the server becomes overloaded, the solution will either fall below acceptable performance levels or become unavailable.

Load-Balanced Tier

To increase the scalability and to maintain performance, the organization might use a load balancer to extend the application tier. The following example, shown in Figure 5, adds two servers to the application tier to create a load-balanced cluster, which accesses data from the data tier and provides application access to the clients in the client tier.


Figure 5: Solution with a scalable application tier

The result is a standard load-balanced design. Either a hardware device or software that is running on the host machines assigns a virtual host name (AppServer20) and an IP address to AppServer1, AppServer2, and AppServer3. The load-balanced cluster exposes this virtual IP address and host name to the network and balances the load of the incoming requests evenly across healthy servers within the group. If AppServer1 fails, the request is simply directed to AppServer2 or AppServer3. Depending upon the technology used to provide this functionality, a certain number of additional servers can be added to the load-balanced cluster to maximize scalability and stay ahead of increasing demand.

Resulting Context

The Load-Balanced Cluster pattern results in the following benefits and liabilities:


Improved scalability. Scalable load-balanced tiers enable the system to maintain acceptable performance levels while enhancing availability.

Higher availability. Load balancing enables you to take a server offline for maintenance without loss of application availability.

Potential cost savings. Multiple low-cost servers often provide a cost savings over higher-cost multiprocessor systems.


Development complexity. A load-balanced solution can be difficult to develop if the solution must maintain state for individual transactions or users.

Doesn't account for network failure. If a server or network failure occurs during a client session, a new logon may be required to reauthenticate the client and to reestablish session state.

For more information, see the following related patterns:

Server Clustering. Server Clustering discusses the use of virtual computing resources to enhance the scalability and eliminate the single points of failure that affect availability.

Failover Cluster. Failover clusters can create redundancy in the infrastructure to increase availability.

Tiered Distribution. Tiered Distribution organizes the system infrastructure into a set of physical tiers to optimize server environments for specific operational requirements and system resource usage.


[Microsoft03] Microsoft Corporation. "Technical Overview of Windows Server 2003 Clustering Services." Available on the Microsoft Windows Server 2003 Web site at:

[Sundblad00] Sundblad, Sten and Per. Designing for Scalability with Microsoft Windows DNA. Microsoft Press, 2000.

patterns & practices Developer Center