Share via

Queue-Based Load Leveling Pattern

MessagingAvailabilityPerformance & ScalabilityDesign PatternsShow All

Use a queue that acts as a buffer between a task and a service that it invokes in order to smooth intermittent heavy loads that may otherwise cause the service to fail or the task to time out. This pattern can help to minimize the impact of peaks in demand on availability and responsiveness for both the task and the service.

Context and Problem

Many solutions in the cloud involve running tasks that invoke services. In this environment, if a service is subjected to intermittent heavy loads, it can cause performance or reliability issues

A service could be a component that is part of the same solution as the tasks that utilize it, or it could be a third-party service providing access to frequently used resources such as a cache or a storage service. If the same service is utilized by a number of tasks running concurrently, it can be difficult to predict the volume of requests to which the service might be subjected at any given point in time.

It is possible that a service might experience peaks in demand that cause it to become overloaded and unable to respond to requests in a timely manner. Flooding a service with a large number of concurrent requests may also result in the service failing if it is unable to handle the contention that these requests could cause.


Refactor the solution and introduce a queue between the task and the service. The task and the service run asynchronously. The task posts a message containing the data required by the service to a queue. The queue acts as a buffer, storing the message until it is retrieved by the service. The service retrieves the messages from the queue and processes them. Requests from a number of tasks, which can be generated at a highly variable rate, can be passed to the service through the same message queue. Figure 1 shows this structure.

Figure 1 - Using a queue to level the load on a service

Figure 1 - Using a queue to level the load on a service

The queue effectively decouples the tasks from the service, and the service can handle the messages at its own pace irrespective of the volume of requests from concurrent tasks. Additionally, there is no delay to a task if the service is not available at the time it posts a message to the queue.

This pattern provides the following benefits:

  • It can help to maximize availability because delays arising in services will not have an immediate and direct impact on the application, which can continue to post messages to the queue even when the service is not available or is not currently processing messages.
  • It can help to maximize scalability because both the number of queues and the number of services can be varied to meet demand.
  • It can help to control costs because the number of service instances deployed needs only to be sufficient to meet average load rather than the peak load.


Some services may implement throttling if demand reaches a threshold beyond which the system could fail. Throttling may reduce the functionality available. You might be able to implement load leveling with these services to ensure that this threshold is not reached.

Issues and Considerations

Consider the following points when deciding how to implement this pattern:

  • It is necessary to implement application logic that controls the rate at which services handle messages to avoid overwhelming the target resource. Avoid passing spikes in demand to the next stage of the system. Test the system under load to ensure that it provides the required leveling, and adjust the number of queues and the number of service instances that handle messages to achieve this.
  • Message queues are a one-way communication mechanism. If a task expects a reply from a service, it may be necessary to implement a mechanism that the service can use to send a response. For more information, see the Asynchronous Messaging Primer.
  • You must be careful if you apply autoscaling to services that are listening for requests on the queue because this may result in increased contention for any resources that these services share, and diminish the effectiveness of using the queue to level the load.

When to Use this Pattern

This pattern is ideally suited to any type of application that uses services that may be subject to overloading.

This pattern might not be suitable if the application expects a response from the service with minimal latency.


A Microsoft Azure web role stores data by using a separate storage service. If a large number of instances of the web role run concurrently, it is possible that the storage service could be overwhelmed and be unable to respond to requests quickly enough to prevent these requests from timing out or failing. Figure 2 highlights this issue.

Figure 2 - A service being overwhelmed by a large number of concurrent requests from instances of a web role

Figure 2 - A service being overwhelmed by a large number of concurrent requests from instances of a web role

To resolve this issue, you can use a queue to level the load between the web role instances and the storage service. However, the storage service is designed to accept synchronous requests and cannot be easily modified to read messages and manage throughput. Therefore, you can introduce a worker role to act as a proxy service that receives requests from the queue and forwards them to the storage service. The application logic in the worker role can control the rate at which it passes requests to the storage service to prevent the storage service from being overwhelmed. Figure 3 shows this solution.

Figure 3 - Using a queue and a worker role to level the load between instances of the web role and the service

Figure 3 - Using a queue and a worker role to level the load between instances of the web role and the service

Related Patterns and Guidance

The following patterns and guidance may also be relevant when implementing this pattern:

  • Asynchronous Messaging Primer. Message queues are an inherently asynchronous communications mechanism. It may be necessary to redesign the application logic in a task if it is adapted from communicating directly with a service to using a message queue. Similarly, it may be necessary to refactor a service to accept requests from a message queue (alternatively, it may be possible to implement a proxy service, as described in the example).
  • Competing Consumers Pattern. It may be possible to run multiple instances of a service, each of which act as a message consumer from the load-leveling queue. You can use this approach to adjust the rate at which messages are received and passed to a service.
  • Throttling Pattern. A simple way to implement throttling with a service is to use queue-based load-leveling and route all requests to a service through a message queue. The service can process requests at a rate that ensures resources required by the service are not exhausted, and to reduce the amount of contention that could occur.

More Information

For more information about choosing a messaging and queuing mechanism in Azure applications see:

Next Topic | Previous Topic | Home | Community

patterns & practices Developer Center