Next Generation TCP/IP Architecture

Applies To: Windows Server 2008

Next Generation TCP/IP Protocols and Networking Components Architecture

Windows Server 2008 and Windows Vista include many changes and enhancements to the following protocols and core networking components:

  • Next Generation TCP/IP stack

  • IPv6 enhancements

  • Policy-based Quality of Service (QoS) for enterprise networks

Next Generation TCP/IP stack

Windows Server 2008 and Windows Vista include a new implementation of the TCP/IP protocol stack known as the Next Generation TCP/IP stack. The Next Generation TCP/IP stack is a complete redesign of TCP/IP functionality for both Internet Protocol version 4 (IPv4) and Internet Protocol version 6 (IPv6) that meets the connectivity and performance needs of today's varied networking environments and technologies.

The following features are new or enhanced:

  • Receive Window Auto-Tuning

  • Compound TCP

  • Enhancements for high-loss environments

  • Neighbor Unreachability Detection for IPv4

  • Changes in dead gateway detection

  • Changes to PMTU black hole router detection

  • Network Diagnostics Framework support

  • Windows Filtering Platform

  • Explicit Congestion Notification

Receive Window Auto-Tuning

The TCP receive window size is the amount of bytes in a memory buffer on a receiving host that is used to store incoming data on a TCP connection. To correctly determine the value of the maximum receive window size for a connection based on the current conditions of the network, the Next Generation TCP/IP stack supports Receive Window Auto-Tuning. Receive Window Auto-Tuning determines the optimal receive window size per connection by measuring the bandwidth-delay product (the bandwidth multiplied by the latency of the connection) and the application retrieval rate. It then automatically adjusts the maximum receive window size on a regular basis.

With better throughput between TCP peers, utilization of network bandwidth increases during data transfer. If all the applications are optimized to receive TCP data, the overall utilization of the network can increase substantially.

Compound TCP

Whereas Receive Window Auto-Tuning optimizes receiver-side throughput, Compound TCP (CTCP) in the Next Generation TCP/IP stack optimizes sender-side throughput. By working together, they can increase link utilization and produce substantial performance gains for large bandwidth-delay product connections.

CTCP is used for TCP connections with a large receive window size and a large bandwidth-delay product (the bandwidth of a connection multiplied by its delay). It aggressively increases the amount of data sent at a time, yet ensures that its behavior does not negatively impact other TCP connections.

For example, in testing performed internally at Microsoft, backup times for large files were reduced by almost half for a 1 gigabit-per-second connection with a 50 millisecond round-trip time (RTT). Connections with a larger bandwidth-delay product can have even better performance.

Enhancements for high-loss environments

The Next Generation TCP/IP stack supports the following Request for Comments (RFCs) to optimize throughput in high-loss environments:

  • RFC 2582: The NewReno Modification to TCP's Fast Recovery Algorithm

    When multiple segments in a window of data are lost and the sender receives a partial acknowledgement that data was received, the NewReno algorithm provides faster throughput by changing the way that a sender can increase its sending rate.

  • RFC 2883: An Extension to the Selective Acknowledgement (SACK) Option for TCP

    SACK, defined in RFC 2018, allows a receiver to indicate up to four noncontiguous blocks of received data. RFC 2883 defines an additional use of the SACK TCP option to acknowledge duplicate packets. This allows the receiver of the TCP segment containing the SACK option to determine when it has retransmitted a segment unnecessarily and adjust its behavior to prevent future retransmissions. Reducing the number of retransmissions that are sent improves the overall throughput.

  • RFC 3517: A Conservative Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for TCP

    Whereas the Windows Serve 2003 and Windows XP operating systems use SACK information only to determine which TCP segments have not arrived at the destination, RFC 3517 defines a method of using SACK information to perform loss recovery when duplicate acknowledgements have been received and replaces the fast recovery algorithm when SACK is enabled on a connection. The Next Generation TCP/IP stack keeps track of SACK information on a per-connection basis and monitors incoming acknowledgements and duplicate acknowledgements to more quickly recover when segments are not received at the destination.

  • RFC 4138: Forward RTO-Recovery (F-RTO): An Algorithm for Detecting Spurious Retransmission Timeouts with TCP and the Stream Control Transmission Protocol (SCTP)

    The Forward-Retransmission Timeout (F-RTO) algorithm prevents unnecessary retransmission of TCP segments. Unnecessary retransmissions of TCP segments can occur when there is a sudden or temporary increase in the round-trip time (RTT). The result of the F-RTO algorithm is that for environments that have sudden or temporary increases in the RTT, such as when a wireless client roams from one wireless access point (AP) to another, F-RTO prevents unnecessary retransmission of segments and more quickly returns to its normal sending rate.

Neighbor Unreachability Detection for IPv4

Neighbor Unreachability Detection is a feature of IPv6 in which a node maintains status about whether a neighboring node is reachable, providing better error detection and recovery when nodes suddenly become unavailable. The Next Generation TCP/IP stack also supports Neighbor Unreachability Detection for IPv4 traffic by tracking the reachable state of IPv4 nodes in the IPv4 route cache. IPv4 Neighbor Unreachability Detection determines reachability through an exchange of unicast Address Resolution Protocol (ARP) Request and ARP Reply messages or by relying on upper layer protocols such as TCP.

Changes in dead gateway detection

Dead gateway detection in TCP/IP for Windows Server 2003 and Windows XP provides a failover function, but not a failback function in which a dead gateway is tried again to determine whether it has become available. The Next Generation TCP/IP stack provides failback for dead gateways by periodically attempting to send TCP traffic by using the previously detected dead gateway. If the TCP traffic sent through the dead gateway is successful, the Next Generation TCP/IP stack switches the default gateway to the previously detected dead gateway. Support for failback to primary default gateways can provide faster throughput by sending traffic by using the primary default gateway on the subnet.

Changes in PMTU black hole router detection

Path maximum transmission unit (PMTU) discovery, defined in RFC 1191, relies on the receipt of Internet Control Message Protocol (ICMP) Destination Unreachable-Fragmentation Needed and Don’t Fragment (DF) Set messages from routers containing the MTU of the next link. However, in some cases, intermediate routers silently discard packets that cannot be fragmented. These types of routers are known as black hole PMTU routers. Additionally, intermediate routers might drop ICMP messages because of firewall rules. Due to black hole PMTU routers, TCP connections can time out and terminate.

PTMU black hole router detection senses when large TCP segments are being retransmitted and automatically adjusts the PMTU for the connection, rather than relying on the receipt of the ICMP error messages. In Windows Server 2003 and Windows XP, PMTU black hole router detection is disabled by default because enabling it increases the maximum number of retransmissions that are performed for a specific network segment.

The Next Generation TCP/IP stack enables PMTU black hole router detection by default to prevent TCP connections from terminating.

Network Diagnostics Framework support

The Network Diagnostics Framework is an extensible architecture that helps users recover from and troubleshoot problems with network connections. For TCP/IP-based communication, the Network Diagnostics Framework prompts the user through a series of options to eliminate possible causes until the cause of the problem is identified or all possibilities are eliminated. Specific TCP/IP-related issues that the Network Diagnostics Framework can diagnose are the following:

  • Incorrect IP address

  • Default gateway (router) is not available

  • Incorrect default gateway

  • NetBIOS over TCP/IP (NetBT) name resolution failure

  • Incorrect DNS settings

  • Local port is already being used

  • The DHCP Client service is not running

  • There is no remote listener

  • The media is disconnected

  • The local port is blocked

  • Low on memory

  • TCP extended statistics (ESTATS) support

The Next Generation TCP/IP stack supports the Internet Engineering Task Force (IETF) draft "TCP Extended Statistics MIB," which defines extended performance statistics for TCP. By analyzing ESTATS on a connection, it is possible to determine whether the performance bottleneck for a connection is the sending application, the receiving application, or the network. ESTATS is disabled by default and can be enabled per connection. With ESTATS, non-Microsoft independent software vendors (ISVs) can create powerful diagnostics and network throughput analysis applications.

Windows Filtering Platform

Windows Filtering Platform (WFP) is a new architecture in the Next Generation TCP/IP stack that provides APIs so that non-Microsoft ISVs can filter at several layers in the TCP/IP protocol stack and throughout the operating system.

WFP also integrates and provides support for next-generation firewall features such as authenticated communication and dynamic firewall configuration based on an application's use of the Windows Sockets API. ISVs can create firewalls, antivirus software, diagnostic software, and other types of applications and services. Windows Firewall and IPsec in Windows Server 2008 and Windows Vista use the WFP API.

Explicit Congestion Notification

When a TCP segment is lost, TCP assumes that the segment was lost due to congestion at a router and performs congestion control, which dramatically lowers the TCP sender’s transmission rate. With Explicit Congestion Notification (ECN) support on both TCP peers and in the routing infrastructure, routers experiencing congestion mark the packets as they forward them. TCP peers receiving marked packets lower their transmission rate to ease congestion and prevent segment losses. Detecting congestion before packet losses are incurred increases the overall throughput between TCP peers. ECN is not enabled by default.

IPv6 Enhancements

The Next Generation TCP/IP stack supports the following enhancements to IPv6:

  • IPv6 enabled by default

  • Dual IP stack

  • GUI-based configuration

  • Teredo enhancements

  • Integrated IPsec support

  • Multicast Listener Discovery version 2

  • Link-Local Multicast Name Resolution

  • IPv6 over PPP

  • Random interface IDs for IPv6 addresses

  • DHCPv6 support

IPv6 enabled by default

In Windows Server 2008 and Windows Vista, IPv6 is installed and enabled by default. You can configure IPv6 settings through the properties of the Internet Protocol version 6 (TCP/IPv6) components and through commands in the Netsh interface IPv6 context.

IPv6 in Windows Server 2008 and Windows Vista cannot be uninstalled, but it can be disabled.

Dual IP stack

The Next Generation TCP/IP stack supports a dual IP layer architecture in which the IPv4 and IPv6 implementations share common transport (TCP and UDP) and framing layers. The Next Generation TCP/IP stack has both IPv4 and IPv6 enabled by default. There is no need to install a separate component to obtain IPv6 support.

GUI-based configuration

In Windows Server 2008 and Windows Vista, you can manually configure IPv6 settings by using a set of dialog boxes in the Network Connections folder, similar to how you can manually configure IPv4 settings.

Teredo enhancements

Teredo provides enhanced connectivity for IPv6-enabled applications by providing globally unique IPv6 addressing and by allowing IPv6 traffic to traverse network address translations (NATs). With Teredo, IPv6-enabled applications that require unsolicited incoming traffic and global addressing, such as peer-to-peer applications, will work over a NAT. These same types of applications, if they used IPv4 traffic, would either require manual configuration of the NAT or would not work at all without modifying the network application protocol.

Teredo can now work if there is one Teredo client behind one or more symmetric network address translators (NATs). A symmetric NAT maps the same internal (private) address and port number to different external (public) addresses and ports, depending on the external destination address (for outbound traffic). This new behavior allows Teredo to work among a larger set of Internet-connected hosts.

In Windows Vista, the Teredo component will be enabled but inactive by default. In order to make the component active, a user must either install an application that needs to use Teredo, or choose to change firewall settings to allow an application to use Teredo.

Integrated IPsec support

In Windows Server 2008 and Windows Vista, IPsec support for IPv6 traffic is the same as that for IPv4, including support for Internet Key Exchange (IKE) and data encryption. The Windows Firewall with Advanced Security and IP Security Policies snap-ins now support the configuration of IPsec policies for IPv6 traffic in the same way as IPv4 traffic. For example, when you configure an IP filter as part of an IP filter list in the IP Security Policies snap-in, you can now specify IPv6 addresses and address prefixes in the IP Address or Subnet fields when specifying a specific source or destination IP address.

Multicast Listener Discovery version 2

Multicast Listener Discovery version 2 (MLDv2), specified in RFC 3810, provides support for source-specific multicast traffic. MLDv2 is equivalent to Internet Group Management Protocol version 3 (IGMPv3) for IPv4.

Link-Local Multicast Name Resolution (LLMNR) allows IPv6 hosts on a single subnet without a Domain Name System (DNS) server to resolve each other’s names. This capability is useful for single-subnet home networks and ad hoc wireless networks.

IPv6 over PPP

Remote access now supports IPv6 over the Point-to-Point Protocol (PPP), as defined in RFC 2472. IPv6 traffic can now be sent over PPP-based connections. For example, IPv6 over PPP support allows you to connect with an IPv6-based Internet service provider (ISP) through dial-up or PPP over Ethernet (PPPoE)-based connections that might be used for broadband Internet access.

Random interface IDs for IPv6 addresses

To prevent address scans of IPv6 addresses based on the known company IDs of network adapter manufacturers, by default Windows Server 2008 and Windows Vista generate random interface IDs for static autoconfigured IPv6 addresses, including public and link-local addresses.

DHCPv6 support

Windows Server 2008 and Windows Vista include a Dynamic Host Configuration Protocol version 6 (DHCPv6)-capable DHCP client that performs stateful address autoconfiguration with a DHCPv6 server. Windows Server 2008 includes a DHCPv6-capable DHCP Server service.

Quality of Service

In Windows Server 2003 and Windows XP, Quality of Service (QoS) functionality is made available to applications through the Generic QoS (GQoS) APIs. Applications that used the GQoS APIs accessed prioritized delivery functions. In Windows Server 2008 and Windows Vista, there are new facilities to manage network traffic for both the enterprise and the home.

Policy-based QoS for enterprise networks

QoS policies in Windows Server 2008 and Windows Vista allow IT staff to either prioritize or manage the sending rate for outgoing network traffic. IT staff can confine the settings to specific application names, specific source and destination IP addresses, and specific source and destination TCP or UDP ports.

QoS policy settings are part of user configuration or computer configuration Group Policy settings and are configured by using the Group Policy Management Console. They are linked to Active Directory® Domain Services containers (domains, sites, and organizational units) by using the Group Policy Management Console.

To manage the use of bandwidth, you can configure a QoS policy with a throttle rate for outbound traffic. By using throttling, a QoS policy can limit the aggregate outbound network traffic to a specified rate. To specify prioritized delivery, traffic is marked with a Differentiated Services Code Point (DSCP) value. The routers or wireless access points in the network infrastructure can place DSCP-marked packets in different queues for differentiated delivery. Both DSCP marking and throttling can be used together to manage traffic effectively. Because the throttling and priority marking are taking place at the network layer, applications do not need to be modified.