How SQL Server Supports NUMA
The following key changes were introduced in SQL Server 2005 to take advantage of non-uniform memory access (NUMA) architecture.
Grouping of Common CPUs
SQL Server groups schedulers to map to the grouping of CPUs, based on the hardware NUMA boundary exposed by Windows. For example, a 16-way box may have 4 NUMA nodes, each node having 4 CPUs. This allows for a greater memory locality for that group of schedulers when tasks are processed on the node. With SQL Server you can further subdivide CPUs associated with a hardware NUMA node into multiple CPU nodes. This is known as soft-NUMA. Typically, you would subdivide CPUs to partition the work across CPU nodes. For more information about soft-NUMA, see Understanding Non-uniform Memory Access.
When a thread running on a specific hardware NUMA node allocates memory, the memory manager of SQL Server tries to allocate memory from the memory associated with the NUMA node for locality of reference. Similarly, buffer pool pages are distributed across hardware NUMA nodes. It is more efficient for a thread to access memory from a buffer page that is allocated on the local memory than to access it from foreign memory. For more information, see Growing and Shrinking the Buffer Pool Under NUMA.
Each NUMA node (hardware NUMA or soft-NUMA) has an associated I/O completion port that is used to handle network I/O. This helps distribute the network I/O handling across multiple ports. When a client connection is made to SQL Server, it is bound to one of the nodes. All batch requests from this client will be processed on that node.
Each time the instance of SQL Server is started in a NUMA environment, the SQL error log contains informational messages describing the NUMA configuration.
How SQL Server Maps Soft-NUMA Nodes to Hardware NUMA Nodes
Soft-NUMA is defined once for all SQL Server instances on the computer, so multiple instances of the Database Engine all see the same soft-NUMA nodes. Each instance of the Database Engine then uses the affinity mask option to select the appropriate CPUs. Each instance will then use whatever soft-NUMA nodes are associated with those CPUs.
When starting, Windows allocates memory for the operating system from hardware NODE 0. Accordingly, hardware NODE 0 has less local memory available for other applications than the other nodes. This problem is accentuated when there is a large system file cache. When SQL Server starts on a computer with more than one NUMA node, it tries to start on a NUMA node other than NODE 0 so that its global structures can be allocated on the local memory. To configure soft-NUMA, see How to: Configure SQL Server to Use Soft-NUMA.
How Connections Are Assigned to NUMA Nodes
Both TCP and VIA can affinitize connections to one or more specific NUMA nodes. When not affinitized, or when connecting with named pipes or shared memory, connections are distributed to NUMA nodes on a round-robin basis. Within a NUMA node, the connection is run on the least loaded scheduler on that node. Due to the round-robin nature of assigning new connections, it is possible that all CPUs within a node are busy while another node is idle. If you have very few CPUs (for example, 2) and if you see large scheduling imbalances because of long-running batches such as bulk load, you may have better performance if you turn off NUMA. For more information, see How to: Map TCP/IP Ports to NUMA Nodes.
SQL Server Version Limitations
SQL Server 2000 through Service Pack 3 does not include special support for NUMA; however, Service Pack 4 has some limited NUMA optimizations. SQL Server 2005 has many substantial improvements, and users of NUMA are strongly encouraged to upgrade to SQL Server 2005 to take full advantage of NUMA architecture.