The Cable Guy - June 2006
Microsoft Windows Server 2003 Scalable Networking Pack Overview
With the Microsoft® Windows Server® 2003 Scalable Networking Pack and the next generation of network acceleration and hardware-based offload technologies, you can cost-effectively scale your network-based applications and services by upgrading your existing servers with next-generation network adapters. The Microsoft Windows Server 2003 Scalable Networking Pack helps optimize server performance and network throughput for crucial applications such as storage, backup, Web hosting, and TCP-based media streaming. This article provides an overview of the technologies in the Scalable Networking Pack.
Introduction
The technologies provided in the Scalable Networking Pack TCP Chimney Offload, Receive-side Scaling, and NetDMA help optimize server performance when processing network traffic. When combined with compatible network adapter hardware, the Scalable Networking Pack helps remove existing operating system bottlenecks, such as CPU processing overhead related to network packet processing and the ability to use multiple processors for incoming network traffic.
By allowing existing Windows Server 2003 installations to benefit from the hardware offload capabilities found in the latest network adapters, the Scalable Networking Pack reduces the need to purchase additional servers or replace existing servers. Through the combination of the Scalable Networking Pack and a compatible network adapter, you can realize the performance and scalability gains made possible by today's faster networks.
The technologies included in the Scalable Networking Pack do not require configuration or changes to existing applications or network management tools.
TCP Chimney Offload
Managing TCP connections can involve a significant amount of processing, which includes:
Parsing the fields of the TCP header (validating the TCP checksum and processing sequence and acknowledgement numbers, TCP flags, and source and destination ports).
Creating and sending acknowledgements for data received.
Segmentation for data sent.
Copying of data between memory locations for the receive window, the send window, and applications.
Managing timers for TCP retransmission behavior.
By offloading this processing to dedicated hardware, a server computer's CPU can be used for other tasks. TCP/IP in Windows Server 2003 already supports task offload of TCP checksum calculations and TCP segmentation (also known as large send offload [LSO]) to compatible network adapters. TCP Chimney Offload provides automated, stateful offload of all TCP traffic processing to specialized network adapters that implement a TCP Offload Engine (TOE).
Rather than offloading individual tasks, the TOE-capable network adapter maintains state for the significant attributes of a connection, such as IP address, the TCP ports, and segment sequence numbers. This allows the network adapter to perform all of the processing of the TCP traffic without impacting the server's CPU. The benefit of offloading all TCP processing is most pronounced when TCP Chimney Offload is used for long-lived connections with large packet payloads, such as TCP connections for file backup and multimedia streaming.
By moving these TCP processing tasks to a TOE-enabled network adapter, the server's CPU is freed for other application tasks, such as supporting more user sessions or processing incoming requests faster.
TCP Chimney Offload Design
TCP Chimney Offload is integrated with the Windows Server 2003 TCP/IP stack and does not require any changes to applications to support offload APIs. Applications work the same whether the TCP connections for the application have been offloaded or not. Application configuration, management, and network statistics are not affected.
For independent hardware vendors (IHVs) that want to develop TOE-capable network adapters, TCP Chimney Offload provides a high-level device interface for Network Driver Interface Specification (NDIS) miniport drivers that supports a variety of IHV implementation approaches to stateful protocol offload.
TCP Chimney Offload also provides IHVs with a variety of intermediate driver solutions, such as teaming several network adapters to create a single virtual network adapter (for better fault tolerance or load balancing), and support of multiple Virtual LANs. IHVs with network adapters that support TCP Chimney Offload must update their intermediate drivers for NDIS 5.2 and TCP Chimney Offload.
To ensure that TCP Chimney Offload will not reduce the capabilities of existing and future Microsoft Windows® network stacks, TCP Chimney Offload will not offload a connection if the network adapter does not support a needed processing capability, such as Internet Protocol security (IPsec) cryptographic processing.
The following figure shows the architecture and processing paths for TCP Chimney Offload.
Applications Existing applications run over either the TCP/IP stack (Tcpip.sys) or the TOE-capable network adapter through the TCP chimney.
Switch Controls whether data transfer is through Tcpip.sys or the TOE-capable network adapter.
TCP Chimney Logical channel through which state is added or monitored (by Tcpip.sys) and data is exchanged.
State update interfaces Interfaces through which protocols within Tcpip.sys can set or obtain the state of TCP connections.
Data transfer interfaces Interfaces through which the switch and the TCP chimney can exchange data.
NDIS miniport driver NDIS driver for a TOS-capable network adapter.
For more information, see the Scalable Networking: Network Protocol Offload - Introducing TCP Chimney white paper.
Receive-side Scaling
Because of the architecture of NDIS 5.1 miniport drivers, a network adapter in a multiprocessor (or multicore) computer running Windows Server 2003 is associated with a single processor (or core). NDIS 5.1 allows a single deferred procedure call (DPC) to execute at any given time for each network adapter. One or more network packets received from the network on a particular network adapter trigger an interrupt to the host processor and eventually causes a DPC to execute on one of the system processors, typically on the processor that was interrupted. The network stack processes all the received packets in the context of this DPC.
Many scenarios, such as large file transmissions, require the TCP/IP stack to perform significant work in the context of receive DPC processing. In these scenarios, a lack of multiprocessor support in NDIS 5.1 for packet receive processing results in limited scalability. In addition, current Intel Pentium 4 and IA64-based systems route all interrupts from a single device to one specific processor, which results in limited scalability.
The single processor must handle all the traffic received by the network adapter, regardless of whether there are other processors available. The result of this architecture for high-volume servers such as Internet-facing Web servers or enterprise file servers is that the amount of incoming traffic and number of connections that can be serviced by the processor associated with the network adapter is limited. If the processor associated with the network adapter cannot handle the incoming traffic fast enough, the network adapter discards the traffic, resulting in retransmissions and reduced performance.
With the Scalable Networking Pack, a network adapter is not associated with a single processor. Instead, the processing for incoming traffic is distributed among the processors on the computer. This new feature, known as Receive-side Scaling, allows for much more traffic to be received by a network adapter on a high-volume server. NDIS 5.2 and Receive-side Scaling enable multiple DPCs on different processors for each network adapter, while preserving in-order delivery of messages on a per-stream basis. Receive-side Scaling also supports dynamic sharing inbound network processing across multiple processors.
To gain the full performance benefits of parallel processing of received packets, it is essential to preserve in-order delivery. If packets for a group of connections are processed on different CPUs, older packets could actually be processed first. Because TCP acknowledgement generation and processing are highly optimized for in-order processing, performance would be degraded without Receive-side Scaling support for in-order delivery of TCP segments.
Receive-side Scaling enables in-order packet delivery by ensuring that only one processor processes packets for a single TCP connection. This Receive-side Scaling feature requires that the network adapter examine each packet header and then use a hashing function to compute a signature for the packet. The hash result is then used as an index into a table. Because this table contains the specific CPU that is to run the associated DPC, and the host protocol stack can change the contents of the table at any time, the TCP/IP stack can dynamically balance the processing load on each CPU.
With Receive-side Scaling, a multiprocessor computer can now handle more incoming traffic without having to add servers. To take advantage of this new feature, you must install compatible network adapters that can take utilize of the new architecture provided with the Scalable Networking Pack. Receive-side Scaling-capable network adapters are available from many network adapter vendors.
The Scalable Networking Pack monitors network adapters for Receive-side Scaling capabilities. If a network adapter supports Receive-side Scaling, the Scalable Networking Pack uses this capability across all TCP connections, including connections that are offloaded through TCP Chimney Offload.
For more information, see Scalable Networking with RSS.
NetDMA
The Windows Server 2003 Scalable Networking Pack includes NetDMA, which offloads the processing of memory-to-memory data transfers to servers equipped with NetDMA architectures, such as Intel I/O Acceleration Technology. NetDMA minimizes the amount of processing that a CPU must do to move packet contents between memory buffers.
Without NetDMA and its associated hardware, the CPU is extensively involved in moving network data from network adapter receive buffers into application buffers. NetDMA largely frees the CPU from handling the mundane task of copying data between memory locations so that it can be used for other tasks.
The Windows Server 2003 Scalable Networking Pack includes the NetDMA interface, which is designed to effectively and efficiently manage interactions with the DMA engine and manage DMA transfers. The Scalable Networking Pack invokes NetDMA when it detects supporting hardware. If the Scalable Networking Pack detects that the hardware can support both NetDMA and TCP Chimney Offload, NetDMA is enabled and TCP Chimney Offload is disabled.
For More Information
For more information about the Windows Server 2003 Scalable Networking Pack, consult the following resources:
For a list of all The Cable Guy articles, click here.