Chapter 4 - Tradeoffs in the QoS Enabled Network
Varying Quality of Service Guarantees
Efficiency vs. Quality of Guarantees
Sharing Network Resources - Multiple Resource Pools
Quality/Efficiency Product and Overhead
In previous sections we reviewed a number of QoS mechanisms. In following sections, we'll see how these mechanisms can be combined to build a QoS-enabled network. In this section we'll discuss the requirements of the QoS enabled network and the pragmatic tradeoffs which must be considered in its design.
Earlier in this whitepaper we effectively stated that network QoS provides the ability to handle application traffic such that it meets the service needs of certain applications. We also stated that, if network resources were infinite, the service needs of all applications would be trivially met. It follows that QoS is interesting to us because it enables us to meet the service needs of certain applications when resources are finite. In other words:
A QoS enabled network: should provide service guarantees appropriate for various application types while making efficient use of network resources.
Different qualities of service guarantees are appropriate for different applications. The quality of a guarantee refers to the level of commitment provided by the guarantee. This is not necessarily related either to the actual amount of resources committed, nor to the cost of the resources. For example, a guarantee that commits to carry 100 Kbps with a per-packet latency not to exceed 10 msec is a high quality guarantee. A guarantee that commits to carry 1 Mbps with the appearance of a lightly loaded network is a lesser quality guarantee. A guarantee that offers no commitment regarding latency bound or drop probability is a low quality guarantee.
The first two levels of guarantee described correspond to the guaranteed and controlled-load intserv services. The third corresponds to the standard best-effort service ubiquitously available today. There are other levels of guarantee that may be useful. For example, one could imagine varying degrees of better-than-best-effort (BBE) which offer to carry traffic with lower latency or at higher rates than it would be carried if it were best-effort, but make no specific quantifiable commitments. Often the terms 'quantitative QoS' and 'qualitative QoS' are used to refer to services such as guaranteed and controlled load on the one hand versus BBE on the other hand.
An appraisal of the quality of the guarantee is not a judgement regarding its value to the end-user, but rather a statement of its suitability to different applications. For example, a BBE level of guarantee may be entirely satisfactory to a web surfing application while a guaranteed service level of guarantee is required to handle interactive voice traffic. While the quality of the guaranteed service is higher, it would be excessive for a web surfing application. From a cost/performance perspective, the end user of a web surfing application would likely be more satisfied with the lower quality guarantee. Cost is a pragmatic consideration related to the efficiency with which network resources are used. If cost were not a concern, it would be desirable to support the highest quality guarantees possible.
Low quality guarantees are relatively easy to provide in an efficient manner by using simple QoS mechanisms. For example, existing best-effort corporate networks generally provide a very low level of guarantee with very few QoS mechanisms. Users may be able to web-surf fairly painlessly (assuming that the targeted web servers are not a bottleneck). The extent of QoS mechanism present in these networks is that the network administrator keeps an eye on the network usage level and, from time to time, (as the number of users on the network grows), adds capacity to (re-provisions) the network. It may take one second for a typical web query to complete, or it may take five, depending on the time of day and the activity level of other users on the network. However, the service level perceived by the network users, remains relatively satisfactory.
If web surfing were deemed critical to the jobs of the corporate network users, it might make sense for the network administrator to use simple top-down QoS configuration mechanisms to improve the service perceived by web surfing users. For example, the network administrator might identify those devices in the corporate network that tend to congest, and configure them with classifiers to recognize web surfing traffic and to direct it to high priority queues in the devices. This is essentially a top-down, diffserv approach. It would tend to improve the service level perceived by reducing the average time it takes for web queries to complete.
This is quite an efficient approach, as no resources have been added to the network or committed to web surfers. However, while it does provide a quality of service guarantee that is better than best-effort, it is still a relatively low quality of service guarantee. There are no bounds on the latency perceived by the users. Further, the latency might degrade significantly in the event that an unusually high number of users decided to web-surf simultaneously (thereby overwhelming the higher priority queues in the network devices). This condition would be especially severe if all simultaneous users resided on the same subnet and/or connected to web-servers on the same subnet. In this case, unusually high demands would be placed on a smaller set of network devices. Thus, the quality of the service guarantee would depend on the number of simultaneous web surfing users and their location in the network topology.
The network administrator might attempt to limit such degradations in quality of service by adding capacity to those network devices that tend to congest. However, much of the time, there would not be an unusually high number of users web surfing simultaneously and those that were would tend to be distributed across the network (rather than co-located on a single subnet). Thus, much of the time, the added capacity would be unused. As a result, network resources would be used inefficiently.
A simple analogy to non-network traffic engineering is helpful in illustrating the quandary faced by the network administrator. Consider the urban developer faced with the task of building a street system. The developer should probably design roads with the capacity to carry average expected traffic loads. Remote areas of the city will generally require smaller roads. Central, highly trafficked areas of the city will generally require larger roads. This approach is efficient. On occasion, a large number of drivers might flock to a remote area of the city for a specific event. As a result, the smaller road serving this part of the city will become congested. The developer could reduce the odds of such congestion by building large roads even to remote parts of the city. However, this would be inefficient since, most of the time, these roads would be relatively underutilized.
Providing high quality of service guarantees is more challenging than providing low quality of service guarantees. In the previous example, the network administrator has the option of provisioning the network for average expected load. Under extreme conditions, congestion might cause web surfing response times to increase, but the application would still be useable.
Consider instead, an IP telephony application. IP telephony users each require from the network a guarantee to carry 64 Kbps, with a maximum end-to-end latency no higher than 100 msec. A higher latency renders the service useless. In this example, the network administrator resorting to top-down QoS configuration mechanisms has no choice but to over-provision the network. (In the subsequent section, we will see how the use of signaling QoS configuration addresses this problem.) For example, assume that out of 1000 potential users of IP telephony, there are on the average 10 simultaneous users. Efficiency considerations would suggest that a device in the center of the network should be provisioned to accommodate 10 simultaneous users at a latency of 100 msec.
Assume that telephony sessions between 10 users are currently in progress (the network is at capacity). Let's see what happens when two additional users attempt to place an IP telephony call. The incremental traffic would overload the low latency service queue in the network device, thereby raising latencies above 100 msec and compromising service to all 12 IP telephony users. At this point, all resources allotted to IP telephony would be wasted since none of the 12 users would perceive satisfactory performance.
In this example, provisioning for average load dramatically compromises the quality of service guarantee that can be given to IP telephony users. The chance of compromise is directly proportional to the chance that the network is required to carry even one IP telephony session beyond that number for which it is provisioned. Generally, to provide high quality guarantees in a top-down provisioned QoS network requires significant over-provisioning.
Although the QoS mechanisms to provide a particular guarantee may vary from point to point in the network, the guarantee must be valid end-to-end. The network provider offers guarantees because the network administrator can charge for guarantees. The network administrator can charge for guarantees because the network user is willing to pay for guarantees. The network user is willing to pay for guarantees only because the experience of the network user is improved as a result of the guarantee. The experience of the network user is improved only if the quality of the connection between the user's endpoints is improved. Hence the end-to-end requirement. Certain large providers may claim that they are able to charge their peer network providers for guarantees, without concern for the end customer. However, this is not a sustainable model. Ultimately, the provider's peer or the provider's peer's peer is collecting money from the end user to pay its provider.
There is no clear dividing line between the network provisioning requirements to support low quality guarantees and those to support high quality guarantees. The higher the quality of guarantees desired, the more it is necessary to over-provision the network for the same level of user satisfaction. Thus, the lower the efficiency with which network resources will be used. In providing a QoS-enabled network, there exists a continuum of provisioning options in which the quality of guarantees available is traded off against efficiency of network resource usage.
In the previous examples we considered only top-down provisioning of the network. In the following discussion, we see that by using a signaling approach to QoS configuration, it is possible to shift the quality of guarantee versus efficiency tradeoff in the network administrator's favor.
Consider again the IP telephony example. Let's assume that users of the IP telephony application signal an RSVP request for resources to the network before actually obtaining the resources. The device in the center of the network is aware of the capacity in its low latency queue and is able to listen to and respond to RSVP signaled requests for resources. In this case, the network device installs classifiers in response to signaling requests from the first ten IP telephony users. These classifiers are used to identify traffic entitled to the low latency queue in the device. The device would reject the RSVP request from the eleventh and twelfth user. No classifiers would be installed for these users and their traffic would not impact the quality of guarantees already made to the first ten users.
In this example, the network is able to offer very high quality guarantees to some limited number of simultaneous users. It refuses guarantees beyond this number in order to preserve the quality of the guarantees that are offered to sessions already in progress. This is achieved without any over-provisioning. In this sense, the network in this example is optimal. However, it is also somewhat unrealistic. It assumes a single device in the center of the network through which all traffic passes. In reality, network topologies are far more complex. Providing optimal efficiency while maintaining high quality guarantees would require that every network device participate in signaling, that these devices be able to strictly enforce the allocation of resources to one conversation versus another, that applications be able to precisely quantify their resource requirements and so on. In general, this is not the case. And so, while the support of signaling in the network can shift the quality of guarantee versus efficiency tradeoff in the network administrator's favor, it cannot, in a real network, simultaneously offer high quality of guarantees and optimal efficiency.
We have shown that signaling can improve the tradeoff between quality of guarantee and efficiency of network resource usage. However, this comes at a cost. Signaling itself requires network resources. Any form of signaling generates additional network traffic. RSVP signaling, due to its soft state, does so continually (albeit at low volumes). In addition, in order for the signaling to be useful, it is necessary for network devices to intercept signaling messages and to process them. This consumes processing resources in the network devices. When analyzing the benefits of signaling it is necessary to consider these effects.
There are ways to exploit the benefits of signaling while reducing its inherent impact on network resources. These include aggregation of signaling messages and reduction in the density of signaling nodes.
Aggregation of Signaling Messages
In the case of standard RSVP signaling, messages are generated for each conversation in progress. In those parts of the network through which there is frequently a large number of conversations, it is possible to aggregate signaling messages regarding aggregate resources. For example - in the case of a transit network interconnecting two corporate subnetworks, per-conversation RSVP requests between the subnetworks might be aggregated at the boundaries between the subnetworks and the transit network. The per-conversation signaling messages would still be carried end-to-end, but would not be processed within the transit network. Instead, aggregate signaling messages would be exchanged between edges of the transit network and would reserve resources in the transit network to support the number of simultaneous end-to-end conversations. The aggregate reservation would be adjusted from time to time in response to demand.
Signaling Density
In theory, optimal efficiency is attained when every device in the network participates in signaling and admission control. However, this is costly in terms of signaling processing overhead, signaling latency, and so forth. As an alternative, the network administrator may configure only certain key devices to participate in signaling and admission control. A relatively sparse configuration of signaling and admission control devices reduces the costs associated with signaling overhead but also compromises the benefits of signaling in terms of the quality of guarantees which can be offered or the efficiency with which network resources can be used. To see why this is the case, it is necessary to understand the awareness of traffic patterns that is implicit in RSVP signaling and is key to admission control.
Signaling and Awareness of Traffic Patterns
Consider the network illustrated in the following diagram:
For the example, assume the following:
All routers participate in RSVP signaling.
One QoS session requiring 64 Kbps is initiated between host A and host B.
Another session requiring 64 Kbps is initiated between host A and host D.
In this case, one RSVP request for 64 Kbps would reach the three routers in the data path between host A and host B. Another RSVP request for 64 Kbps would reach the three routers between host A and host D. The routers would admit these resource requests because they would not over-commit any of the links1. If instead, hosts B and C each attempted to simultaneously initiate a 64 Kbps QoS session to host A, the router serving these hosts would prevent one or the other of these sessions from being established.
RSVP signaling enables an awareness of traffic patterns. Because resource requests arrive at each device that would be impacted by admission of the request, it is possible to refuse requests that would result in the over-commitment of resources. Two simultaneous requests for 64 Kbps could be admitted if one were along the right branch of the network and the other along the left branch of the network. However, if both were along the same branch of the network, one of the requests would not be admitted.
Now assume that the network administrator reduces the density of signaling-enabled network devices by disabling the processing of QoS signaling messages in the lower three routers (serving hosts B, C, D and E). Only the topmost router participates in signaling, becoming in effect, the admission control agent for itself as well as the remaining routers in the network. In this case, requests for resources up to 128 Kbps would be admitted regardless of the location of the participating hosts. Service guarantees would be low quality guarantees, as it would be possible for traffic from one host to compromise service for a session granted to the other.
The quality of guarantees could be maintained if the topmost router were configured to limit admission of resource requests to 64 Kbps. However, this would result in inefficient use of network resources as only one conversation could be supported at a time, when in fact two could be supported if their traffic were distributed appropriately. Alternatively, all 64 Kbps links in the network could be increased to 128 Kbps links to avoid over-commitment of resource requests, but the increased capacity would be used only in the event that hosts B and C (or D and E) required resources simultaneously. If this were not the case, such over-provisioning would also be inefficient.
We see that, in general, by reducing the density of signaling enabled devices, we reduce the value of signaling in terms of the tradeoff between quality of guarantees and efficiency of network resource usage. This is because the network administrator has imperfect knowledge of network traffic patterns. If the network administrator knew with certainty, in the above example, that hosts B and C (or hosts D and E) never required low latency resources simultaneously, they could be offered high quality guarantees without signaling and without incurring the inefficiencies of over-provisioning. In smaller networks, it is very difficult for the network administrator to predict traffic patterns. In larger networks, it tends to be easier to do so. Thus, reductions in the density of signaling aware devices tends to compromise efficiency less in large networks than in small networks.
Other Benefits of Signaling
There are other benefits of signaling which are unrelated to the tradeoff between quality of guarantees and efficiency of network resource usage. These include the end-to-end integration of QoS on disparate network media as well as the provision of classification and policy information to network devices. These benefits will be discussed later in the paper.
The QoS-enabled network must provide both low and high quality guarantees. High quality guarantees are typically made practical via the use of signaling, admission control, and strict policing along specific routes. In order to maintain the quality of these guarantees, it is important to prevent traffic that makes use of lower quality guarantees from stealing resources committed to higher quality guarantees. However, traffic using lower quality guarantees is not policed as strictly as traffic using higher quality guarantees. Specifically, it tends not to be policed based on its route through the network. As a result, it may appear at various locations in the network in volumes above those anticipated. To prevent such unexpected traffic from compromising higher quality guarantees, it is necessary to assign this traffic lower priority in its use of network resources at specific devices. This does not mean that applications requiring lower quality guarantees are deemed to be lower priority by the network administrator. In fact, typically, the percentage of available resources at any node that is allocated to high quality guarantees is only a very small fraction of the total resources available, with the majority remaining available for lower quality guarantees. It does mean, however, that under congestion conditions, traffic requiring lower quality guarantees will be deferred in favor of traffic requiring higher quality guarantees up to some limit.
In effect, there are several resource pools in the diffserv network. These are used by traffic requiring different quality guarantees. Traffic is separated by:
Aggregating it according to the service level to which it is entitled.
Policing traffic requiring higher quality guarantees such that it does not starve traffic using lower quality guarantees.
We can identify four general resource pools by the traffic for which they are used:
Quantifiable traffic requiring high quality guarantees - This type of traffic requires a specifically quantifiable amount of resources. These resources are typically allocated as a result of RSVP signaling, which quantifies the amount of resources required by the traffic flow. The highest priority queues are reserved for this traffic. This traffic is subjected to strict admission control and route-dependent policing. Examples of this type of traffic include IP telephony traffic and other interactive multimedia traffic.
Non-quantifiable persistent traffic requiring high quality guarantees - This type of traffic requires resources that cannot be specifically quantified. However, it tends to be persistent in the sense that it consumes resources along a known route for some reasonable duration. Resources are allocated to this class of traffic as a result of RSVP signaling that does not specifically quantify the resources required by the traffic flow. This signaling informs the network of the application sourcing the traffic as well as the route taken through the network. The information facilitates prediction of traffic patterns, enabling reasonable quality guarantees. However, since resource requirements are not strictly specified, resource consumption cannot be strictly policed and the traffic is forced to use queues that are of lower priority than those available for quantifiable traffic. Examples of this type of traffic include traffic of client-server, session oriented, mission critical applications such as SAP and PeopleSoft.
Non-quantifiable, non-persistent traffic requiring low or medium quality guarantees - This type of traffic is relatively unpredictable. Its resource requirements cannot be quantified, and its route through the network is fleeting and subject to frequent changes. The overhead of signaling cannot be justified, as it would provide little information to assist the network administrator in managing the resources allocated to this traffic. Because the impact of this traffic is so unpredictable, it is forced to use queues that are of lower priority than those used by signaled traffic. As a result, only low quality guarantees can be offered to such traffic. An example of this type of traffic is web surfing.
Best-effort traffic - This is all the remaining traffic, which is not quantifiable, not persistent, and does not need any quality of service guarantees. The network administrator must assure that there are resources available in the network for such traffic but need provide no specific quality of service for it. This traffic uses default FIFO queues and receives those resources that are 'left-over' after the requirements of higher priority traffic have been satisfied.
The QoS network administrator is faced with the task of provisioning admission control limits for each of these classes of traffic. By doing so, the administrator is effectively dividing the network resources into the resource pools mentioned at the start of this section.
We can summarize this section by recognizing the tradeoffs inherent in designing a QoS enabled network. Recall that the goal of QoS enabling a network is to provide the various qualities of guarantee required by the customer's applications, while maintaining efficient use of network resources. We can measure the quality of a QoS network by the product of the quality of guarantees it offers and the efficiency of resource usage. We will refer to this metric as the quality/efficiency product of the network.
A third factor to consider in the design of a QoS network, is the overhead. Overhead refers to the processing and storage overhead in network elements that is directly attributable to the QoS mechanisms themselves (whether for traffic handling or for signaling processing)2. All QoS mechanisms impose an overhead on the network, increasing its cost. The cost of any QoS mechanism in terms of its overhead must be weighed against the potential improvement in the quality/efficiency product. In general, the greater the overhead that the network administrator is willing to tolerate, the higher the quality/efficiency product which can be attained.
Note that this tradeoff, between overhead and quality/efficiency product is a local decision, which may vary from one part of a network to another. For example, it may be quite acceptable to over-provision certain LAN segments, accepting that the only way to obtain quality guarantees through these parts of the network is to use them inefficiently (low quality/efficiency product). This approach requires no QoS overhead in these LAN segments. On the other hand, it may be prohibitively expensive to over-provision certain WAN segments. QoS mechanisms would be employed in these parts of the network with the goal of attaining a higher quality/efficiency product. Thus, any debate as to the value of one or another QoS mechanism, should be considered in these terms.
The following table illustrates variations of the general QoS mechanisms we have discussed so far and their impact in terms of overhead vs. quality/efficiency product:
Mechanism |
Overhead |
Quality/Efficiency |
FIFO traffic handling |
None |
Low |
Aggregate traffic handling |
Low |
Medium |
Per-flow traffic handling |
High |
High |
Top-down provisioning |
Low |
Low |
Aggregate signaling |
Medium |
Medium |
Per-flow signaling |
High |
High |
Sparse signaling |
Medium |
Medium |
Dense signaling |
High |
High |
Note that in general, a single part of the network may be designed with a variety of tradeoff points to accommodate differing traffic types. For example, while the WAN part of the network may use per-flow signaling and traffic handling to provide a high quality/efficiency product for IP telephony traffic, it may handle traffic from less demanding applications on a FIFO basis with no signaling. Thus, the network administrator divides the WAN subnet into multiple resource pools (as described earlier in this section) appropriate for the types of traffic it will carry.
Note that we use the term overhead in reference to the work required from the network to provide QoS. Such overhead is not to be confused with what is commonly called management overhead. We will refer to the latter as management burden here, in order to avoid confusion with overhead. These are different concepts. For example, extensive use of signaling may significantly reduce management burden (as compared with top-down provisioning). However, it does result in higher overhead. A classic example of incurring additional overhead in the interest of reducing management burden is the use of address resolution protocols (such as ARP) versus statically configured (MAC address) tables.
1 | In practice, routers would not be configured to allow all resources available to be reserved for a particular conversation. However, for simplicity's sake, we assume in this case that the entire link resources can be reserved. |
2 | At first glance it might appear that overhead is captured in the efficiency metric. However, overhead is defined to be the cost of resources dedicated to the QoS mechanisms themselves, while efficiency relates to the raw network resources that are bandwidth and buffer space. |