Instant Rules of Thumb for Tuning and Sizing NT Server

Article
02/20/2014

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

By Curt Aubley

Chapter 1 from Tuning & Sizing NT Server, published by Prentice Hall PTR

When reading technical material, I hate having to weed through countless pages of information before getting to the really good stuff. That is why I have decided to put some of the practical key points in the beginning of this book. This chapter provides little backup for the recommendations, just key points and "rules of thumb" you can immediately put to use. These key points and rules, along with more advanced techniques for tuning and sizing NT Server, are explored in depth in the chapters that follow. Preview this section before looking at the other chapters. As you read subsequent chapters, these rules of thumb and other tuning and sizing concepts will become more concrete in their usage. Later, this chapter can be used to refresh your memory.

Beware: Knowing what each tuning technique does is more important than just knowing which switch to flip! To fully comprehend the advantages and uses of these rules of thumb, read through the book's chapters. Being informed not just what the tips are but on how and why they influence the behavior of NT Server will help you make more intelligent decisions and spark some alternate tuning ideas. Keep in mind that although these tips can improve your tuning and sizing efforts, they may potentially have adverse effects if implemented incorrectly. The rules of thumb selected were chosen because they are relatively safe to implement and can immediately improve the performance of your NT Server. To say that you'll have an immediate improvement of the performance of your NT Server is a strong statement, but a realistic one when you have an understanding of the workloads occurring in your NT Server solution environment.

Be sure to make two good backups of the registry and understand how to restore the registry before tuning it. (Refer to Appendix A for a refresher on backing up and restoring the registry.) This concept holds true for any tuning, actually; always understand and have a plan to reverse your efforts if needed. Then stress test or benchmark your new configuration to affirm that you have actually improved your server's performance and that it is stable. Remember that the final true test of performance of an interactive application is always the end user's performance perception.

Performance Statistics Gathering: Start the Logs
Key Points to Remember When Using Performance Monitor
Three Action Items to Complete So That All Relative Performance Counters Can Be Collected
Determining Which Applications (Processes) Are Running Under NT Server
NT Server Bottleneck Detection Strategy
Tuning Memory Resources
Tuning Disk Resources
Appropriate Use of RAID
Tuning RAID Controller Cache
Tuning NT Server's Network Subsystem
Tuning CPU Resources
Sizing and Tuning Specific Server Implementations
Microsoft Exchange
Sizing Rules of Thumb
Sizing Disk Subsystem Performance
Sizing the CPU(s)
Sizing Memory Requirements
Summary

Performance Statistics Gathering: Start the Logs

Without some sort of baseline, either from internal or external metrics, you may never know how much you have improved the performance of your NT Server or how to detect current or potential bottlenecks. The information you collect using NT Server's Performance Monitor (Perfmon) is particularly important in tuning and capacity sizing. Performance Monitor should become one of your favorite tools. Chapter 2, "Tuning Strategies and Measurement Gathering" closely investigates how, when, and why to use the Perfmon-related actions described in this section.

Key Points to Remember When Using Performance Monitor

Starting Performance Monitor (Perfmon)

Action: Start|Administrative Tools|Performance Monitor

Chart Mode

Good for looking at current activity or reviewing logs

Action: File|Chart Mode

Adding Counters to Any Mode

Performance monitor contains a set number of objects. Each object contains subsequent specific counter metrics that can be collected.

Action: Edit|Add to (chart, log, etc.)

Starting Log Mode

Action: To log performance data, start Perfmon

Then, select View|Log, Edit|Add to log. Select all objects using the mouse and shift key, then click Add|Done. To begin the logging session: Select Options|Log, enter the name of your log file (e.g., test1-perfmon-log), specify a sampling interval, and then click start. Perfmon uses about the same amount of resources to collect measurements from one object or many objects from NT Server's performance library DLLs, so collect them all except for network segment. DLL—Dynamic Link Library is an NT Server feature that enables executable routines to be stored separately with the .dll extension and to be loaded only by the program that needs them.

Viewing Active Perfmon Logs While Logging Is Occurring

If the copy of Perfmon that is actively being used for logging Perfmon data is used to view the current Perfmon data in any way, Perfmon will stop logging your data. To avoid this inconvenience, launch a second copy of Performance Monitor to view the currently active logging session or real time server data.

Three Action Items to Complete So That All Relative Performance Counters Can Be Collected

Add the Simple Network Management Protocol (SNMP) service

The SNMP service adds network interface object to performance monitor so that Perfmon collects statistics surrounding the network interface card. To add the SNMP service, select Start|Settings|Control Panel|Networks|Services|Add, then add the SNMP service. Once NT Server is rebooted, the SNMP service is activated.
Add the Network Tools and Agent

The Network Tools and Agent adds the network segment object to Perfmon and a network analysis tool to NT Server found under Start|Programs|Administrative Tools Network Monitor. This tool set allows NT Server to collect network-related information for the entire network segment, not just the network interface card of NT Server itself. To add the Network Tools and Agent, select Start|Settings|Control Panel|Networks|Services|Add, then add Network Tools and Agent. Once NT Server is rebooted, these tools are activated.

Note: Collecting data using either the Perfmon object network segment or the Network Monitor places the selected network interface card into promiscuous mode, which adds additional overhead to your server above and beyond the standard performance monitor application.
Run Diskperf

To monitor disk performance statistics, you must run "diskperf -ye" from a command prompt and reboot the server before disk statistic gathering is enabled. Options for diskperf can be obtained by typing "diskperf /?" from the NT Server command line.

Determining Which Applications (Processes) Are Running Under NT Server

When performance monitor is started in log mode, it will only collect metrics from processes that are currently running. If a job is started after the logging began, the job generating the load will not be able to be identified. An excellent technique to circumvent this Perfmon limitation is to generate an alert when resource usage levels increase above an acceptable threshold. This alert would then start a second copy of Perfmon so that the guilty process can be identified and dealt with accordingly. Chapter 2 reveals the step-by-step procedure to implement this technique.

NT Server Bottleneck Detection Strategy

Wouldn't it be nice if you could observe just one or two metrics to determine what is going on in your NT Server? Unfortunately, since the server's subsystems are all interrelated, subsequently so are the metrics to observe, even under NT Server. In Chapter 2, "Tuning Strategies and Measurement Gathering," the importance of considering the entire server performance picture when trying to locate an NT Server bottleneck is investigated. There can be more than one server resource area that is contributing to the throttling of NT Server's overall performance. Once all of the major server resource areas are evaluated, focus in on improving the performance of the resource that is farthest to the left in the performance resource chart (Figure 1–1). This strategy will provide the greatest immediate gain to your NT Server's overall performance. Once one server resource is removed as the bottleneck, others may take its place, which will subsequently influence where you focus additional tuning efforts.

General NT Server Observations

NT Server typically will run short of memory before any other system resources. Watch this resource closely.
Even when you are concerned about another potential resource bottleneck, ensure there is not a memory shortage.
Once memory shortages are ruled out, the disk and network subsystems are typically the next sources of contention.

Figure 1-1: Performance resource chart indicating a memory bottleneck.
You can achieve some of the greatest gains in NT Server performance tuning by properly tuning and sizing the memory subsystem, disk subsystem, and network subsystem; in that order.

Key Performance Metrics to Observe when Detecting Server Bottlenecks

There are a dizzying number of Perfmon objects and counters to consider when sleuthing out NT Server bottlenecks that are throttling overall performance. Listed in the following section is a condensed version of the key metrics to observe when determining if a bottleneck is forming that will throttle back your server's overall performance. All of the metrics outlined here are available when using the default NT Server Perfmon tool. A common concern I hear is that some of these counters always display zero, which is misleading. Ensure that all of the Perfmon counters are turned on as outlined in the section "Three Action Items to Complete So That All Relative Performance Counters Can Be Collected."

These rules of thumb are generic and commonly hold true. As you begin to collect measurements and develop a baseline on your particular NT Server(s), pay particularly close attention to these objects and counters. Any drastic variation from the baseline you develop and/or trends observed over a given period of time, consecutive weeks for instance, can be cause for action. For example, if the Logical Disk: % utilization on a particular disk drive has held steady at 20 percent for four weeks, then suddenly the Logical Disk: % utilization begins to increase at 5 percent each week, investigate and understand what is running on the server. Subsequently, develop an action plan to alleviate this ominous looking condition that is developing before it becomes that bottleneck that slows down your entire system.

Table 1.1 Primary Counters for NT Server memory bottleneck detection.

Object/Counter	Definition	Rule of Thumb for Bottleneck Detection
Memory: Available bytes	Available bytes displays the size of virtual memory currently on the zeroed, free, and standby lists. Zeroed and free memory is ready for use with zeroed memory cleared to zeros. Standby memory is removed from a processor working set but is still available.	NT Server will try to keep at least 4 MB available as seen via this counter. If this value is near 4 MB and pages/sec is high, and the disk drive where the Pagefile is located is busy, there is a memory shortfall. For NT Servers with their memory optimization set to maximizing throughput for file sharing reduce this value to 1 MB.
Memory: Pages/sec	Pages/sec is the number of pages read from the disk or written to the disk to resolve memory references that were not in memory at the time of reference.	This counter can be deceiving, particularly on Enterprise class NT Servers. A high paging activity is fine (>100), unless it is accompanied by a low available bytes indicator and a high % disk time on the paging file disks.
Logical Disk: % Disk Time (for dedicated pagefile.sys disk drive)	Disk time is the percentage of elapsed time that the selected disk drive is busy servicing read or write requests.	If the % disk time is over 10 percent, the available bytes are hovering around 4 MB, and pages/sec is high, the Virtual Memory Manager is using the paging file significantly and the server is experiencing a memory bottleneck. Note the % disk time counter must point to disk drive(s) that contains paging system file(s), pagefile.sys.
Server: Pool Nonpaged Failures	The number of times allocations from nonpaged pool have failed.	If this value is >1 on a regular basis, there is not enough physical memory in the server.
Server: Pool Paged Failures	The number of times allocations from paged pool has failed.	If this value is >1 on a regular basis, there is not enough physical memory in the server or the paging file is too small.

Table 1.2 Primary counters for NT Server disk bottleneck detection.

Object: Counter	Definition	Rule of Thumb for Bottleneck Detection
Logical Disk: % Disk Time	% Disk time is the percentage of elapsed time that the selected disk drive is busy servicing read or write requests.	If this value increases above 60–80 percent, then the response time from the disk drive may become unacceptable. This is the red flag to begin investigating the other disk counters listed below.
Logical Disk: Average Disk Queue Length	Average disk queue length is the average number of both read and writes requests that was queued for the selected logical disk during the sample interval.	If this value is greater than two for a single disk drive and the % disk time is high, the selected disk drive is becoming a bottleneck. This value is an average calculated during the Perfmon sample period. Use this counter to determine if there is a disk bottleneck and the current disk queue length counter to understand the actual workload distribution.
Logical Disk: Current Disk Queue Length	Current disk queue length is the number of requests outstanding on the disk at the time the performance data is collected.	If this value is greater than two for a single disk drive over a sustained period of time and the % disk time is high, the selected disk drive is becoming a bottleneck. This value is instantaneous. Collect granular statistics over time to ensure that there is a sustained problem, not an instantaneous workload increase.
Logical Disk: Disk Transfers/sec	Disk transfers/sec is the rate of read and write operations on the disk.	If this value rises consistently above 80 for a single physical disk drive, observe if the average disk sec/transfer counter is reporting values higher than your baseline or what you consider acceptable. If it is, then disk drive is slowing down the overall server's performance.
Logical Disk: Average Disk sec/Transfer	Average disk sec/transfer is the time in seconds of the average disk transfer.	When the Transfers/sec counter is consistently above 80 for a single disk drive, the average disk sec/transfer should be observed to determine if it is rising above your baseline. A value greater than 0.3 seconds indicates that the selected disk drive's response times is uncommonly slow.
Logical Disk: Disk Bytes/sec	Disk bytes/sec is the rate bytes are transferred to or from the disk during write or read operations.	Sum this counter's value for each disk drive attached to the same SCSI channel and compare it to 80 percent of the theoretical throughput for the SCSI technology in use. If this summed disk/bytes per second value is close to 80 percent, it is the SCSI bus itself that is becoming the disk subsystems bottleneck. Use this data and some math to review the complete disk subsystem data path.

Table 1.3 Primary counters for NT Server Network bottleneck detection.

Object/Counter	Definition	Rule of Thumb for Bottleneck Detection
Network Interface: Output Queue Length	Output queue length is the length of the output packet queue (in packets).	If this value is longer then three for sustained periods of time (longer than 15 minutes), the selected network interface instance is becoming a network bottleneck. Note that this value is not valid in SMP environments as of NT 4 SP3. Look for future service packs to fix this abnormality.
Network Interface: Bytes Total/sec	Bytes total/sec is the rate that bytes are sent and received on the selected network interface, including framing characters.	This value is directly related to the network architecture in use. If the value of bytes total/sec for the network instance is close to the maximum transfer rates of your network, and the output queue length is >3, you have a network bottleneck.
Network Interface: Current Bandwidth	Current bandwidth is an estimate of the interface's current bandwidth in bits per second.	For interfaces that do not vary in bandwidth or for those where no accurate estimation can be made, this value is the nominal bandwidth reported by NT Server. Use this information in conjunction with the bytes total/sec counter to determine the network utilization levels.
Network Segment: % Network Utilization	Percentage of network bandwidth in use on this network segment.	The network architecture in use determines the acceptable level of % utilization. For Ethernet to based network segments, if this value is consistently above the 50–70 percent range, the network segment is becoming a bottleneck and is increasing the response times to everyone using the network.
Network Interface: Packets Outbound and Received Errors	Packets outbound errors received is the number of outbound packets that could not be transmitted/processed because of network errors.	If this value is >1, the selected network interface is experiencing network problems that are causing the network to slow and potentially become a bottleneck. This problem could be emanating from any NIC or network device connected to the network segment.
Redirector: Current Commands	Current commands counts the number of requests to the redirector that are currently queued for service.	If this number is much larger than the number of network adapter cards installed in the computer, then the network(s) and/or the server(s) being accessed are seriously bottlenecked.
Server: Work Item Shortage	The number of times STATUS_DATA_NOT_ ACCEPTED was returned at receive indication time.	This indicates that NT Server has not allocated sufficient initworkitems or maxworkitems which are causing network limitations.

Table 1.4 Primary counters for NT Server CPU(s) bottleneck detection.

Object/Counter	Definition	Rule of Thumb for Bottleneck Detection
System: % Total Processor Time	% Total processor time is expressed as a percentage of the elapsed time that a processor is busy executing a nonidle thread.	A high value for this counter is not a reason to be alarmed, unless it is accompanied with a server work queue length greater than four or growing with an associated level of processor time greater then 90 percent, then the CPU is becoming a bottleneck. This rule holds true for single CPU NT Servers.
System: % Total Processor Time	% Total processor time is expressed as a percentage of the elapsed time that a processor is busy executing a nonidle thread.	A high value for this counter is not a reason to be alarmed, unless it is accompanied with an aggregate sustained server work queue length sum greater than two times the number of processors in the server and the processor time is greater then 90 percent. If this is occurring, then the CPUs are becoming a bottleneck. This rule holds true for multi CPU NT Servers.
Server Work Queues: Queue Length	Queue length is the current length of the server work queue for this CPU instance.	If the queue length is >4 or continuously growing with an associated level of total processor time >90 percent, the CPU is becoming a bottleneck. This rule holds true for single CPU NT Servers.
Processor: % Interrupt Time	% Interrupt time is expressed as a percentage of the elapsed time that the processor spent handling hardware interrupts.	This value in itself is not a true indicator of a processor bottleneck. The value of this counter is helpful in determining where to focus your tuning efforts. If this counter is greater than 20 percent and rising compared to your baseline, consider completing diagnostics on the peripheral components to ensure they are operating within acceptable parameters.
Processor: % Privileged Time	% Privileged time is expressed as a percentage of the elapsed time that the processor spent in privileged mode in nonidle threads.	If this value is greater than counter % user time, focus on tuning server resources and investigate how well the application is consuming the privileged time.
Processor: % User Time	% User time is expressed as a percentage of the elapsed time that the processor spent in user mode in nonidle threads.	If this value is greater than counter % privileged time, focus on tuning user/application processes and resources to yield a better return.
Process: % Processor Time	Process processor time is the percentage of elapsed time that all threads of the selected process used the processor to execute instructions.	Select all processes that are currently running and display them. This technique can quickly show which processes are consuming the highest level of CPU time and thus who might be the culprit for the CPU bottleneck.

Immediate Tuning Tips to Implement

Tables 1.1 through 1.4 are a great quick reference to use when wading through the almost endless counters contained in Perfmon to identify NT Server bottlenecks. The following is a structured methodology to assist the tuning process:

Always have a backup of the server, files, and registry before making any changes.
Monitor your server and develop a baseline.
Proactively monitor your server.
Determine what resource is acting like or becoming a bottleneck.
Try a single change at a time, when possible, and carefully document all change(s).
Benchmark and test your system to determine if the change(s) is helpful and stable.
Return to step 3.

This is a helpful methodology to follow when tuning your NT Server solution. Now that the key metrics to observe are presented to help in locating those potential bottlenecks and we have a methodology to follow to assist in the tuning process, let us review general tuning tactics.

Tuning Memory Resources

Selecting NT Server Memory Strategy

NT Server incorporates the facilities to tune its primary memory management strategies. (Chapter 5, "NT Server and Memory Performance," investigates the server's memory architecture and shows how NT Server utilizes memory with much more detail.)

The most common technique to control how NT Server controls its limited memory resources is found under Start|Settings|Control Panel|Network|Services. Select Server, then click Properties. This dialog box offers five options. For most multiuser environments, two options for tuning NT Server's memory strategy are of particular interest: Maximize Throughput for File Sharing and Maximize Throughput for Network Applications. The selection made here can have profound effects on how the server performs, so choose wisely.

Option 1: Maximize Throughput for File Sharing.

Select this option only if your NT Server acts as a dedicated file server or its applications exhibit very similar behaviors of a file server. For those environments, this memory strategy provides the greatest level of performance. Do not select this option if this server provides any other services besides that of a file server. If you are running any other applications on the NT Server, such as Microsoft SQL Server, or another memory-intensive application, the server might begin to page (swap information between RAM and the disk) excessively. If this occurs, it will lower the server's overall performance.

Option 2: Maximize Throughput for Network Applications.

Select this option for just about every type of server other than that of a file server. NT Server allows less RAM for the dynamic file system cache so that running applications can have access to more RAM. With this option, it is application tuning that generally becomes more important. When you configure applications such as Oracle, Sybase, SQL Server or Microsoft Exchange, you can tune them to use specified amounts of RAM for areas such as buffers for disk I/O or general database caching. Knowing what is running on your system is particularly important here: If you allocate too much memory for each application in a multiapplication environment, moderate paging can turn into thrashing (excessive paging), and you will have one slow system.

Optimizing Virtual Memory

This is a good technique to implement that can make your life a whole lot easier for a variety of tuning and sizing tasks. The paging file system can be split up among sixteen separate pagefile.sys file systems. Typically, splitting the paging file between two–four separate physical disk drives and, if possible, different SCSI channels, is the optimum layout. To accomplish this task, first select two disk drives that are to be dedicated to containing the page file and nothing else. Select Start|Settings|Control Panel|System|Performance. Select Virtual Memory and create two new paging files, one on each disk. After the new paging file systems are in place, remove the default pagefile.sys on the root disk. This will improve your virtual memory performance when NT Server must utilize virtual memory.

When you create the pagefile(s), you are asked to set the initial and maximum size of the paging file. Under Perfmon, observe the paging file: %usage peak parameter. To optimally set the pagefile, set the initial size parameter of the pagefile at % usage peak value. As a corollary to this rule of thumb, at a minimum, set the paging file initial size to twice the size of physical memory and the maximum size for the pagefile to two times physical RAM (to a maximum of 4 GB). This will to minimize spending any resource time extending the size of the paging file and waiting for the paging file extension to complete. Continue to observe your baseline. Constant use of the paging file is one of the major indications of a memory bottleneck.

Remove Unnecessary Processes from the Server

Every bit of RAM counts. Any processes that are not required to service your clients or manage your server should not be running on the system. Typically, areas to tune here are under Start|Settings|Control Panel|Services. Why run the Remote Access Service (RAS) or Spooler (SLOOLSS.EXE) if the server does not perform as a RAS server or has printers connected to it? Any background processes started, even if not active, use RAM and potentially pagefile space.

Schedule Memory Intensive Jobs During Off Peak Hours

This technique always seemed to be a bit of an easy-out strategy to alleviate memory problems, but it is effective and uses resources that may have otherwise been idle. To schedule a job during an off peak hour, use the "at.exe" command from the command prompt. Note, prior to using this command, the scheduler service must be started under Start|Settings|Control Panel|Services. If you do not care for the command line interface of the "at.exe" command that is native to NT Server, the Microsoft Windows NT Resource Kit contains a graphical-based "at.exe" command named "winat.exe."

Control NT Kernel Paging Activities

NT will page some portions of its own kernel to disk, such as the NT executive system drivers to disk when they have not recently been in use, as NT makes room in memory for other processes. This can be helpful for a system with a limited RAM supply. If your server has ample RAM (for example, at least 64 MB more RAM than is normally used by all of the process working sets on the server plus the RAM normally used by the file system cache), then change the following registry entry: HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Control \Session Manager\Memory Management\DisablePagingExecutive from the default value of 0 to 1. This change will force NT to keep the NT executive (kernel) from paging any of its executive system drivers to disk and insure that they are immediately available.

Purchase More RAM—The Last Resort

More tactics to tune around memory bottlenecks are outlined in Chapter 5, "NT Server and Memory Performance," but when all strategies to tune the memory are exhausted and you have determined that the server is paging excessively, obtain additional RAM for the server. How much RAM to add is a function of current and future system load requirements and anticipated application requirements. To avoid paging, determine how much paging your system does, then add at least that amount. For example, if paging file: % usage max is 20 percent and your paging file size is 1000 MB, consider adding at least 200 MB. Also, when adding the memory to the system, select the RAM size such that it provides the highest density while still providing the highest degree of memory interleaving for growth and performance, respectively.

Tuning Disk Resources

NT Server lets you tune how it uses its disk subsystem and its configuration. In Chapter 6, "NT Server and Disk Subsystem Performance," the server's disk subsystem architecture and how NT Server utilizes the disk subsystem is investigated with much more detail. For now, once the disk subsystem is determined to be an NT Server bottleneck, consider the following tuning options.

Evenly Distribute File System Activity

Review the Perfmon logs regularly to ensure that the disk workload is evenly distributed across the disk subsystem. If % disk time or disk transfers/sec reaches even half of the thresholds mentioned above in the detecting disk bottleneck section, consider spreading the workload on the affected logical drives to other physical drives that are in less demand. One of the most common sources of disk contention is having all applications loaded and running on the root (%SystemRoot%) NT Server disk. The root disk is commonly "c:\", which can quickly become a bottleneck. It is easy to fall victim to this phenomenon, since many applications by default wish to place their information on the root NT Server disk. A tool that is helpful in determining what processes or programs are accessing which files on your server is ntfilmon.exe. This tool is freeware and is available at https://www.ntinternals.com.

Allocate Only One Logical Drive per Physical Disk Drive

A technique that helps isolate disk performance problems and improves performance by lowering the head movement rate over the disks is to format only one logical drive per physical drive. For example, if you have three disk drives, create only three logical drives: C, D, and E.

Select the Appropriate Allocation Unit Size (ALU)

Consider matching the file system ALU to the block size of the application you are using. If SQL Server is using a 4 KB block size, when you format a file system on a new disk drive, launch Disk Administrator, create the partition, commit the partition changes, select Format, and then set the ALU to 4096 bytes. Matching the file system block sizes can improve the efficiency of the disk transfers when you use the application. For more ALU size options, use the format command from the command prompt versus the Disk Administrator tool.

Low-Level Format SCSI Disk Drives Before You Start

When using a SCSI disk drive on a new SCSI host bus adapter, always use the host bus adapters tools to low-level format the disk drive before attempting to format the drive under NT Server. The geometric translation in use varies from host bus adapter to host bus adapter. When SCSI disk drives are placed onto a new host bus adapter, check that the correct translation information is being used. Low-level formatting the disk drive with the tools that are provided by the host bus adapter vendor ensures that the translation in use is correct. This will ensure that proper functionality, reported capacity and performance levels are achieved.

Group Similar Disk Workload Characteristics

Either through reading those famous manuals that come with software products or by using Perfmon and ntfilmon.exe (https://www.Ntinternals.com), try to determine the characteristics of the disk I/O activities occurring on your server. Determine which applications exhibit sequential activities, random activities, are read intensive, or write intensive. Once you have determined the characteristics of your disk activities, group similar workloads activities on the same disk drives or disk arrays. This is a corollary rule to the Evenly Distribute File System Activity. For example, place large log files on a separate disk drive rather than the general user database area.

Once the workload characteristics are understood and evenly distributed, group disk activities across the disk subsystems utilizing the various Redundant Array of Inexpensive Disks (RAID) levels based on the performance guidelines provided in the next section. For example, to improve the performance of sequential write intensive log files, place them onto a RAID 0, 1, or 0+1 array and avoid RAID 5. For a predominantly random environment that is read intensive, RAID 0 and 5 are good selections. The more you understand your server's environment, the better tuned your NT Server solution will be.

Appropriate Use of RAID

Adding RAID technology to your NT Server's disk subsystem is becoming a more common choice for enterprise servers to increase performance, disk management, and availability. RAID is a particularly excellent choice if multiple disk drives are available and it is not possible to break up the data files across separate disk drives to balance the use of the disk subsystem. NT Server allows for the use of both software- and hardware-based RAID solutions. Use NT Server to implement RAID levels 0 or 1 without any concern of a significant performance overhead penalty for implementing. Do not use NT Server to implement a software-based RAID 5 solution unless there is a significant amount of CPU capacity available that can be allotted for calculating the parity information required for RAID 5. A good rule of thumb is to implement all RAID arrays, especially RAID 5, using a hardware-based solution. Hardware-based RAID solutions offload the RAID calculations to a CPU on the RAID adapter.

The following outlines the performance and fault tolerance tradeoffs of various RAID levels:

RAID 0: Disk Striping

RAID level 0 stripes the disk activity across two or more disk drives. This logical layout provides the advantages of better performance for read, write, random, and sequential environments. The tradeoff for using RAID 0 is that there is no fault redundancy; if you lose one drive in your array you will lose the data for the entire array. In a RAID 0 environment, increasing the number of drives in the array improves the random I/O performance.

RAID 1: Disk Mirroring

RAID level 1 mirrors the disk activity across two or more disk drives. This logical layout provides for better read performance, especially in a multiuser environment, but lower performance in a write intensive environment. This RAID level provides complete data redundancy, even if you are using only two disks. The tradeoff for this redundancy is that the capacity of a RAID 1 mirror is lowered by 50 percent. For example, if you have two 9 GB disk drives in a RAID 1 mirror, there is only 9 GB of usable storage space.

RAID 2, 3, 4 and others

These RAID levels are not commonly used in general computing environments so they are not explored in this book. If you are interested in the other RAID level possibilities, search the Internet for RAID and a myriad of choices are presented.

RAID 5: Disk Striping with Parity

RAID level 5 stripes the disk data with parity information across three or more disk drives. This logical layout provides for better read performance, especially in a multiuser environment, but significantly lowers performance in a write intensive environment. This RAID level provides fault tolerance through the use of parity information, allowing for the loss of one of the RAID 5 arrays member disk drives without the loss of any data.

The tradeoff for this redundancy is that the capacity of a RAID 5 striping with parity is lowered by a factor of 1/(size of a disk drive member). For example, if you have three 9 GB disk drives in a RAID 5 array, there is only 18 GB of usable storage space.

RAID 0+1: Disk Mirroring of Disk Striped Sets

RAID 0+1 stripes data across two or more drives then mirrors those with another stripe set for fault tolerance. This logical layout provides better overall performance than a direct implementation of RAID 1. The tradeoff for this performance improvement and high fault tolerance level is that the capacity of a RAID 0+1 mirror is lowered by 50 percent. For example, if you have two 9 GB disk drives striped and mirrored with another two 9 GB stripes, there is only 18 GB of usable storage space.

Table 1.5 shows the relative performance ratings when comparing the various RAID options using a sector/stripe size of 128 KB. Use this guide when selecting the appropriate performance level that match your server's disk I/O characteristics.

Table 1.5 RAID performance guide.

Random Read	Random Write	Sequential Read	Sequential Write
Raid Level	Relative Performance Rating
Stripe (0)	2nd	1st	2nd
Mirror (1)	4th	3rd	4th
Stripe w/parity (5)	2nd	4th	2nd
Mirrored Stripe Set (1+0)	1st	1st	1st

Tuning RAID Controller Cache

If there is a built in cache on the RAID host bus adapter and a battery backup unit, the general rule of thumb is to configure it for the write back caching turned on. The default setting for most adapter caches is write through. Having the write back cache enabled is particularly helpful in write intensive environments implemented with RAID Level 5 disk arrays where there are pauses between periods of heavy disk activity. When your environment is characterized by heavy disk write activity, followed by a lull, the write back cache takes advantage of this workload slowdown to write the cached data to disk.

Tuning NT Server Cache Usage

The most common way to control how NT Server uses the available RAM for disk caching purposes is previously outlined in the Tuning Memory Resources section. In particular, the selection of either, option 1, Maximize Throughput for File Sharing, or option 2, Maximize Throughput for Network Applications, drastically affects your server's disk I/O performance. Although it depends on your environment, adding additional RAM will improve the disk I/O performance.

If the server is functioning as a file server and option 1 is chosen, nothing more than a reboot of the server is required to take advantage of the additional RAM. If option 2 is selected, just rebooting the server will improve some disk I/O performance as more RAM is available for NT Server's dynamic file system cache management, but tuning any applications that can be adjusted internally should be revisited. For example, Microsoft Exchange allows the amount of memory it is allowed to use to be easily set via the Exchange Optimizer Program. Examples of tuning an applications disk cache usage is outlined in Chapter 5, "NT Server and Memory Performance," and Chapter 8, "Putting Theory Into Practice: Sizing and Tuning Case Studies."

Disk Adapter Device Drivers and BIOS

This is one of the easiest techniques to implement to improve disk I/O performance. Manufacturers of disk drive adapters are constantly working on removing bugs and improving the performance of their respective disk adapters. Typically, the latest drivers are available via the manufacturers' World Wide Web site. Even before installing NT Server for the first time, check to see that the latest, best performing stable Disk Adapter Device Driver is available. Make it a point to periodically check your manufacturer's web site. It is amazing how much performance you can obtain through the use of improved device drivers. This concept should also be applied to the BIOS or firmware residing on the disk adapter card itself.

SCSI Command Queuing

Some drivers for SCSI adapters have registry settings for SCSI command queuing. By increasing this value, you can improve the performance of the attached disk subsystem. When this value is increased, more SCSI commands can be in the disk device queue. This technique is particularly helpful in disk array environments. Due to the multiple disk drive nature of disk arrays, they are capable of collating multiple SCSI requests in the most efficient manner to achieve higher levels of performance. Use this technique cautiously. Test your performance before and after editing the registry values. For most large disk array (>10 disks) environments, doubling the default value for the driver improves disk performance. Contact the disk adapter vendor for assistance in finding the SCSI command queuing entry in the registry. For example, Symbios SCSI adapters whose default is 32, the SCSI command queue entry is located in the following location: HKey_Local_Machine \System \CurrentControlSet \Services \symc8xx \Parameters \Device \NumberOfRequests (REG_DWORD 32)

File System Selection

Selecting the appropriate file system is important. For this topic, I will stray away from performance for a moment. Under NT Server (NT 4 and above), there are two file system options: NTFS or FAT. If security is a consideration at all, there is only one choice under NT Server, NTFS. If FAT file systems are in use, try to limit their use to disk drives or partitions smaller then 500 GB. For smaller file system sizes that are characterized by many small files, FAT can actually outperform NTFS. FAT file systems are more apt to become fragmented in a shorter period of time and begin to degrade in overall performance for larger file system sizes. Thus, for all file systems larger than 500 GB, use NTFS.

Disabling short name generation on an NTFS partition can increase directory performance significantly if a high number of non-8.3 filenames are in use, which is becoming more common. To disable short name generation, use REGEDIT32.exe to set the registry DWORD value of 1 in the following Registry location: HKEY_LOCAL_MACHINE \SYSTEM \CurentControlSet \Control \Filesystem \NtfsDisable8dot3nameCreation. Let it be known that this will cause problems if legacy 16 bit MS-DOS awnd MS-Windows-based applications are still in use.

Another registry tunable that can improve file system performance by lowering file system overhead is located in HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Control \FileSystem \NtfsDisableLastAccessUpdate. Changing the default REG_DWORD value of this key from 0 to 1 will stop NT from updating the last access time/date stamp on directories, as directory trees are traversed.

Like any other file system, NTFS can become fragmented over time on heavily used disks. There are commercial products available to defragment disk drives which significantly improve performance of the file system. Included on the CD-ROM is a disk defragmentation tool named diskeeper. The latest version of diskeeper is also available for download from https://www.sun-belt.com.

To lessen the impact of fragmentation and take advantage of the physical characteristics of disk drives, try to keep file systems to less than 60 percent capacity. This will allow NTFS to compensate for fragmentation by taking less time to find additional free space as needed and keep a majority of the data in the file system occupying the outer, faster portion of the disk drive platters.

Group Similar Devices on the Same SCSI Channels

To maximize the performance and efficiency of your SCSI channels, group similar devices on the same SCSI bus. By placing an active CD-ROM on the same SCSI bus as a 10 Disk RAID 0 array, the CD-ROM can effectively slow down access to the faster disk array. When SCSI commands are sent to request data from the CD-ROM, which is a slower device than a disk drive, the other devices on the SCSI bus must wait until the CD-ROM (or another slower device) transfer is complete before transferring the data SCSI standard associated with the other devices. A good rule of thumb to follow is to place CD-ROMS on their own SCSI channel, place each tape unit on its own SCSI channel, and group disk drives with similar features (SCSI type, size, rpm, throughput) on their own SCSI channels. Also, avoid configuring SCSI devices using different levels of SCSI standards on the same SCSI channel, as they will be limited to running at the speed of the slowest SCSI standard on the particular SCSI channel. For example, operating Fast and Wide SCSI (20 Mbytes/sec) and Ultra Fast and Wide SCSI (40 Mbytes/sec) on the same SCSI channel limits the SCSI channel to the slower Fast and Wide SCSI speed.

Additional Disk Subsystem Hardware—The Last Resort

As with any resource that has become a bottleneck, additional resources can normally be added during the sizing of the server if you have considered future growth. Always select the fastest components that have a positive life cycle (remember VESA versus PCI years back). For the disk subsystem, this would involve selecting the fastest available disk drives and disk adapter technology available for insertion into the fastest I/O bus available. For example, if there was a need to add three additional disk drives, selecting a PCI-based SCSI Ultra Fast/Wide Disk Adapter and SCSI Ultra Fast/Wide Disk Drives rotating at 10,000 rpm would be a good choice.

Tuning NT Server's Network Subsystem

The selection of the network architecture in use directly influences the network performance relative to your server. There are many facets of network architecture that can cause a network bottleneck. These facets include client configuration, application design, the physical network, network protocol, and network devices (for example, routers, hubs, etc.). Here, we focus on improving the network performance from NT Server's perspective. More advanced techniques for network subsystems are investigated in Chapter 7, "NT Server and Network Performance."

Balance Server Network Loads

Balance your network load by distributing the more heavily used network segments between two or more Network Interface Cards (NICs). Here, a network segment is defined by the physical connection of other computer systems to the server sharing the same collision domain. Even though network segmentation can involve some new cabling (physical) and subnetting (logical addressing), it is a proven technique to optimize your server's network I/O and is relatively easy to implement. The Network Monitor tool found under Start|Programs|Administrative Tools can aid in determining how to distribute the client systems between the network segments.

Network Protocol and Redirector Binding Selection Order

Another technique that can help you optimize your NT Server's network subsystem performance is to bind only protocols and redirectors (server components other than NT Server "server service") that your network is actually using to your network adapter. Binding is a technique NT Server utilizes to establish a communications channel between the protocol driver (TCP/IP, IPX, etc.) and the NIC itself. Under Start|Settings|Control Panel|Network Protocol, check which protocols (TCP/IP, NetBEUI, etc.) are currently installed. Also, under Control Panel|Network|Services, check which redirectors (RIP for NwLink, RPC support for Banyan, etc.) are installed. Removing unnecessary protocols and redirectors lowers the amount of memory that NT Server requires for network I/O (which can then be used for other tasks) and ensures that your network is not generating any unnecessary traffic, such as unwanted broadcasts.

Network Interface Card Settings

Setting the NIC properly is an area that is commonly taken for granted. Even though various standards defining physical and logical specifications of network communications exist, some compatibility issues always show up when you are not looking. Set your server's NIC and any other network devices your server may communicate with to the best possible network speed setting available. For example, if full duplex 100 BaseTX is available and the other network devices support this setting, choose it. Only as a last resort select auto sensing.

NIC Device Drivers and BIOS Levels

This is one of the easiest techniques to implement to improve network performance and lower server network overhead due to the NIC. Manufacturers of network interface cards are constantly working on removing bugs and improving the performance of their respective network adapters. Typically, the latest drivers are available from the manufacturers' World Wide Web site. Even before installing NT Server for the first time, check to see that the latest, best performing stable Network Device Driver is available. This same concept applies to the BIOS or firmware on the NIC as well. Make it a point to periodically check your manufacturer's web site. You can be rewarded with much better performance with the use of improved device drivers. As with any new software that your NT Server will rely on day in and day out, test the new technology before deploying it into your enterprise.

Permanently Cache MAC Addresses

A MAC address is a unique address that a manufacturer burns into NICs, such as a 100BaseT Ethernet card. TCP/IP uses address resolution packets (ARP) broadcasts over the network to associate an IP address with a physical layer MAC address. To lower the number of broadcasts and time required to obtain MAC addresses, you can permanently (until the next reboot, unless a startup script is used) place the associated MAC/IP address pair into memory. Use the following command sequence from the command prompt to implement: "arp -s 137.111.141.101 01-01-01-12-s3-44 (arp -s ip address macaddress)." This feature is particularly helpful when accessing a commonly accessed networked system that uses a static IP address. This is not a suggested technique when the networked system uses dynamic IP addressing (DHCP).

NIC Selections

Select only stable PCI-based NICs for your server. Although it may be tempting to use a NIC based on EISA, ISA, or MCA because the available PCI slots are dwindling, avoid the temptation if the best performance is a consideration. The relative performance difference between an EISA and PCI-based NICs were tested separately using a file transfer intensive environment, in the same NT Server. Table 1.6 illustrates the drastic performance differences between a generic PCI and an EISA-based adapter.

Table 1.6 Comparing PCI and EISA NICs.

Network Technology (Switched Full Duplex Ethernet)	File Transfer Throughput (Mbits/sec)
PCI 100BaseTX	131.0
EISA 100baseTX	9.61

Performance levels achieved always vary based on the workload. But even with a fairly large deviation, which technology would you choose based on these test results?

Controlling Network Users Timeout Period

There are only so many resources allocated for network connections, so if a user strolls away from their desk to enjoy a sunny day, disconnect them. This frees up resources for active users. The command "net config server" lists out the server's current settings. Running the command "net config server /AUTODISCONNECT: 10" is how to set the automatic disconnect time to 10 minutes. The registry entry for this value is located at: HKEY_LOCAL_MACHINE \SYSTEM \CurentControlSet \Services \lanmanServer \Parameters \Users

Faster NICs—The Last Resort

When all of the networking components in and around your NT Server solution are running at their optimum level, but more performance is required, consider implementing different network technologies, additional NICs, or changing your network architecture.

Tuning CPU Resources

Tuning the CPU resources may initially conjure up opening your server and trying to determine what you can change, but this is not the goal in this section. NT Server uses a priority based round robin scheduling algorithm to distribute process threads among the CPUs in the server. In Chapter 4, "NT Server and CPU Performance," issues surrounding the server's CPU performance and NT Server's use of scheduling algorithms are investigated, and advanced tuning techniques are explored to tune these resources to your best advantage. Before getting to that level of tuning, there are numerous other techniques presented here that can aid in tuning around CPU bottlenecks.

Ensure That Another Server Resource Is Not Acting as a Bottleneck

The most common occurrence I have encountered when tuning around CPU bottlenecks is the removal of other server resource bottlenecks, not the CPU. If another server resource is acting as a bottleneck, it can appear that the CPU is the actual bottleneck when it is not. Memory bottlenecks, in extreme cases, disguise themselves as CPU problems. When NT Server begins to run out of memory, it can exhibit a condition called thrashing. Thrashing is an excessive contention for physical memory. When this condition occurs, every request for memory results in paging activity to the pagefile on the disk drive(s). This results in an increased amount of CPU activity associated with moving memory pages around, not completing truly productive work. Why waste precious CPU cycles on this activity?

Avoid this condition by configuring a sufficient amount of RAM to avoid thrashing. The highest performing NT Servers I have encountered are sized and tuned with one basic strategy in mind: avoid paging. Paging is considered by many to be acceptable if it occurs infrequently. The more the server pages, the more CPU cycles are wasted on memory operations and the slower the final response times provided to your end users. To improve performance if the server must page, spread NT Server's pagefile across two to four dedicated disk drives sized identically. Refer to Chapter 5, "NT Server and Memory Performance," for the specifics on how to determine if your NT Server is paging and for a step-by-step guide on spreading the pagefile across multiple disk drives.

Offload NT Server CPU Overhead

If the server is running out of CPU processing power, offload NT Server activities that are not required or that can be implemented in hardware. The areas normally associated with CPU overhead that can be removed or offloaded are wasteful hardware components, disk compression, RAID operations, and security auditing.

Search for Wasteful Hardware Components

For enterprise servers, do not be surprised if Processor: % Interrupt Time is around five to twenty percent of the total CPU workload, particularly if the network and disk I/O is very heavy. There are, however, some components that behave better than others. Attempt to obtain NICs that truly support bus mastering and disk host bus adapters that support DMA transfers versus PIO. Some disk and network adapters operate more efficiently than others, thus they require less CPU cycles to operate. Trade magazines provide numerous comparisons of these products. One good source of information is the NT Magazine web site located at: https://www.ntmag.com.

Do Not Implement Compression

Compression is nice for notebooks and some workstations, but it has little applicability for an enterprise server. Actively using compression on your server will increase your CPU overhead and slow your disk operations. If you must use compression, relegate its use to drives that are not frequently accessed and are used for archiving only.

Offload CPU Intensive Operations

Whenever possible, offload CPU operations to secondary devices. NT Server allows the native use of software-based RAID 5 implementations. RAID 5 requires constant parity computation every time data is written to a RAID 5 array. Offload this by using a RAID controller with its own CPU for RAID calculations. This frees up CPU cycles for the application instead of wasting them on every disk write activity. Chapter 6, "NT Server and Disk Subsystem Performance," reviews RAID technologies and examines performance issues in depth.

Security Auditing Equals CPU Overhead

Security is important. When increased security levels are required, NT Server auditing is activated (utilizing auditing for tuning is covered in Chapter 2) and a certain level of overhead is introduced onto the server. The amount of overhead introduced is a function of the required level of auditing. The CPU is the server resource that shoulders most of the overhead associated with increased auditing. If auditing is required, do not turn it off. However, be aware that more CPU horsepower may be required to compensate for the increased security to provide the same level of performance experienced without auditing activated.

Remove Faulty Hardware Components

When a hardware device, such as a NIC, interrupts the processor, NT Server's Interrupt Handler will execute to handle the condition, usually by signaling I/O completion and possibly issuing another pending I/O request. Observe the Perfmon counter Processor: Interrupts/sec. If this number begins to grow compared to your baseline when under normal workload, there is a good possibility that a network device has become faulty. When a device becomes faulty, it may begin generating high numbers of interrupts, which inundates the CPU. This wastes precious CPU cycles. Replacing the faulty device will alleviate this situation.

Schedule CPU Intensive Jobs During Off Peak Hours

This technique always seemed to be a bit of an easy-out strategy to alleviate CPU problems, but it is effective and uses resources that may have otherwise been idle. To schedule a job during an off peak hour, use the "at" command from the command prompt. Note, prior to using this command, the scheduler service must be started under Control Panel|Services. To get directions on how to use the at command, type "at /? | more" at the command prompt. Besides the default "at" command included with NT Server there are now friendlier applications to utilize NT Server's scheduler service. The Microsoft Windows NT Server Resource Kit includes a scheduler tool named "winat.exe," which is a friendly GUI front end to the "at" command. Some companies are beginning to introduce more flexible scheduling tools that allow actions to occur based on multiple events besides time. One such tool is Master Minder. Information on Master Minder is available from https://www.ncr.com

NT Server Service Packs

Enterprise operating systems are becoming more feature rich and complicated by the minute; subsequently, they are prone to bugs. Microsoft publishes bug fixes or patches such as service packs and hotfixes. Stay abreast of these service packs as they commonly fix various functional problems and occasionally offer performance improvements. Again, as with any new software, test it before it finds its way into a production environment. Don't get discouraged—I have never used an operating system that did not require periodic patches.

Upgrading the CPU(s)—The Last Resort

The obvious way to alleviate processor bottlenecks is to move to a faster CPU. A faster CPU is particularly helpful if you have predominantly single-threaded applications. If you have a multiuser system using multithreaded applications, you can preserve your investment (i.e., not throw out the older CPU when the new one arrives) by adding additional processors. When the Server Work Queues: Queue Length is greater than two times the number of CPUs, your server is a particularly good candidate for additional CPUs. Windows NT Server currently supports up to 32 processors, although 1, 2, 4, and 8 CPU implementations are the most common NT Server implementations.

Sizing and Tuning Specific Server Implementations

The key to any sizing or tuning effort is to understand the types of workloads involved. Once you begin to understand the workloads involved and develop a server baseline, half the battle is already won. Use the previous tuning rules when applicable, but never underestimate the power of an expert for a particular functional area. For example, to improve database server performance, contact or hire an expert for the database in use. Large performance gains are possible by tuning the database design, actual queries, and applications running on the database. Short of hiring a functional area expert, some of the general uses for an NT server are outlined here with additional rules of thumb pertaining to each specific environment. These recommendations for the following server environments are presented here to help jumpstart the performance tuning effort, not as an all-encompassing review of each environment.

NT File Server

Most file server environments are characterized by the following resource usage activity: Memory: Light to Medium, Processor: Light, Disk: Active, and Network: Active. The two key areas to focus on here are disk I/O and network I/O. To size and tune the disk I/O subsystem, estimate the amount of space required, type of usage, and workload. When sizing NT Servers, it is common to consider the use of only one RAID level throughout the entire server configuration. On the contrary, consider using more than one. For example, if 80 percent of the file server workload is for a read environment, implementing a greater than three disk RAID 5: Disk Striping with Parity array will result in better read performance than a single disk and improve the fault tolerance level of the data. If the other 20 percent of the file system's usage are a write intensive environment such as backing up critical files, consider a RAID 1 Mirror. This multiRAID level concept is implemented in Chapter 8's case study, "Microsoft Exchange—Electronic Mail Case Study Solution."

For the networking I/O, select a network adapter that provides the best performance for the network architecture in use. If the environment is Ethernet, obtain a network adapter that supports bus mastering and contains more than one channel on the card to conserve server adapter expansion slots. From a tuning perspective, create the file systems with the appropriate ALU sizes. If some of the file systems provide larger files, such as multimedia files, use a larger ALU size such as 64 Kb. If the file system supports smaller files in a read/write environment, a smaller ALU size, such as 2 Kb, will typically provide the best performance. Use the format command via the command prompt to access these additional ALU options. For example, to format a logical partition "f:" as an NTFS file system with an ALU of 8 Kb, use the command "format f: /fs:ntfs /a:8192".

For NT Server's primary memory strategy, select option 1, Maximize Throughput for File Sharing. This will allow NT Servers cache manager to favor physical memory operations for the caching of I/O operations.

Primary Domain Controller (PDC) and Backup Domain Controllers (BDC)

Most PDC environments are characterized by the following resource usage activity: Memory: Active, Processor: Light, Disk: Moderate, and Network: Moderate. The two key areas to focus on here are memory strategies and network I/O. For NT Server's primary memory strategy, select Maximize Throughput for File Sharing. This will allow NT Servers cache manager to favor physical memory operations for the caching of I/O operations. This will provide the better performance for the PDC/BDC I/O activities.

The primary tuning strategy for PDC/BDC servers centers on the Netlogon service. One of the key activities performed by the Netlogon service is keeping user account databases in sync with all of the BDC servers and the PDC server. Control the load placed onto the PDC and subsequent maintenance traffic generated on the network via the following three registry keys: PulseConcurrency, Pulse, and Randomize. These parameters are found under HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \Netlogon \Parameters. Lower the PulseConcurrency value to limit the number of simultaneous pulses the PDC will send to the BDCs, which in turn lowers the number of concurrent BDC database update requests. This lowers the server loading on PDC and lowers the peak amount of maintenance traffic over the network. However, there is always some type of tradeoff when tuning any server. In this case, the suggested tuning will increase the time required to replicate user account databases throughout the domain. Which is best if based on your requirements and environment.

Microsoft SQL Server

Most SQL server environments are characterized by the following resource usage activity: Memory: Active, Processor: Active, Disk: Active, and Network: Light. The key areas to focus on here are memory, disk, and network I/O. Wow, is it easy to summarize an application server environment by stating, "just watch the entire system," but this is common and just increases the tuning challenge! Having a sufficient amount of memory and properly distributed database activity across the available disk subsystem are the keys to achieving good database performance from an NT Server perspective. For memory strategies, select option 2, Maximize Throughput for Network Applications. This will provide SQL Server access to more memory and provide less memory for NT Server's cache manager to allocate for file system cache operations. This allows SQL Server more control over how memory is allocated.

Because of the high disk I/O levels associated with SQL Server, providing SQL Server with additional memory allows SQL Server to buffer more of its disk I/O operations. This can help to improve SQL Server's overall performance. SQL Server can be tuned using the SQL Enterprise Manager Tool and the Server Configuration option to ensure any additional memory added to the server can be utilized by SQL Server. When configuring SQL Server's memory options, a conservative rule of thumb is to leave at least 64 Mbytes of physical memory aside for NT Server. This will ensure that NT Server does not get starved for memory. If over time you determine that setting aside 64 Mbytes of RAM for NT Server specific operations is excesive, reallocate part of the reserved 64 MB of RAM to SQL Server activities. If the NT Server that SQL Server is operating on a server that is composed of multiple CPUs, change the "SMP Concurrency" to 0. This forces SQL Server to use all of the servers CPUs. If your server is dedicated to SQL Server activities, set the priority boost to 1, which allows the SQL Server to run at a higher priority. Server priority levels are reviewed in Chapter 4, "NT Server and CPU Performance."

Microsoft Exchange

Microsoft Exchange environments are characterized by the following resource usage activity: Memory: Active, Processor: Medium, Disk: Active, and Network: Medium. A simplified view of Exchange is that it is a complex application sitting on top of its own database. This database is already compartmentalized into the following areas: Private Information Store, Public Information Store, Information Store Logs, Director Service, Directory Service Logs, and Message Transfer Agent.

Characterization of the Disk I/O Subsystem

Be sure you have at least two disk drives set aside for the log files. The log files commonly become disk bottlenecks. By dedicating the logs to their own drives, it becomes easier to tune the disk layout and alleviate bottlenecks that may be formed. If a hardware RAID controller with cache is in use, set the controller cache for "write through mode" for drives containing the log files.

Exchange provides an excellent tool, Microsoft Performance Optimize, that determines (most of the time) the optimal disk layout of these compartmentalized databases and adjusts internal registry settings based on available memory, performance tests the optimizer completes on the disk I/O subsystem, and a series of questions asked when the optimizer is run. It is, however, very important to first have all of the RAM, controllers, disks, and file systems set up correctly before running the optimizer. If, after proactively monitoring your server's performance, you decide to implement hardware changes, such as adding additional memory and disks, the optimizer should be rerun to verify the system is still tuned for peak performance.

A key rule of thumb when setting up the optimizer is a selection that allows you to limit the amount of memory that Exchange can use for operation. Select this item and limit Exchange (if on a dedicated server that is running exchange only) to use all of the system memory except for 64 MB. This will ensure that the NT kernel and I/O cache manager have sufficient system resources available. Another key point is to have sufficient disk I/O resources available so that the optimizer can properly spread the various components of exchange across the disk I/O subsystem. In Chapter 8, "Putting Theory Into Practice: Sizing and Tuning Case Studies," sizing and tuning Microsoft Exchange is reviewed in depth in the Electronic Mail Server case study. Advanced memory tuning techniques for Exchange are outlined in the Memory Optimization example: Improving Microsoft Exchange Memory Usage section of Chapter 5, "NT Server and Memory Performance."

Web Servers

There are two primary considerations in Web server resource usage: the type of content that a Web server will provide and the network bandwidth needed to provide it. From a sizing point of view, the projected user demand and Web server content drives the sizing of the network connection and subsequently the other server resource components. A simplified view of a Web server characterization is that a Web server acts primarily either as a Hyper Text Transfer Protocol (http) file server or an application server.

Either one of these characterizations require optimized network I/O and disk I/O. Reference the rules of thumb for NT Server and SQL Server as they relate to the type of Web server in use. In addition to these rules of thumb, Web Servers are typically connected to the Internet, or intranets, which involve wide area networking. To improve the network performance in these environments, adjust the TCP/IP windowing size, which will allow more packets to be sent before receiving an acknowledgment. This setting can be modified through the following registry setting: HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \Tcpip \Parameters \TcpWindowSize. The setting of this value is highly dependent on the network architecture in use. For most network mediums, the window size should be a multiple of the maximum TCP segment size in use.

Specific to the Web server Internet Information Server (IIS), there are numerous strategies that can influence IIS performance. Regardless of the Web server characteristics, set NT's memory strategy to option 2: Maximize Throughput for Network Applications. This allows for more flexibility in tuning IIS and lowers the probability that any of the IIs working set will be paged to disk. As you observe the IIs web server, if there is available memory, the IIS cache size can be increased. This is set under HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \MemoryCacheSize. Similar to an Exchange or SQL Server environment, when setting the cache value for IIS, ensure that enough memory is set aside for the NT kernel and dynamic cache management. To control how long IIS objects can spend in cache, adjust the following registry key: HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \ObjectCacheTTL. For a relatively static IIs site, triple the default value. For IIS sites that are more dynamic, 1.5 times the default value is a good starting point.

More advanced tuning techniques for Microsoft Internet Information Server (IIS) are outlined in the Web Server Case Study located in Chapter 8, "Putting Theory Into Practice: Sizing and Tuning Case Studies."

Sizing Rules of Thumb

The following rules of thumb for sizing an NT Server should be used as general guidelines before the final NT solution is configured. It is possible to provide a good sizing estimate by utilizing sound, logical sizing techniques which any good IS professional can apply. Follow this structured sizing methodology when developing a new NT Server solution:

Define objective(s).
Understand requirements needed to meet your objective.
Determine loading characteristics.
Determine performance requirements.
Understand future business requirements.
Understand future server architectures.
Configure the server.
Stress test and validate the server configuration.
Proactively follow the Tuning Methodology during validations testing
Deploy the server into production.
Proactively follow the Tuning Methodology after the server is deployed.

Each of the above steps influences both the type and amount of server hardware needed to be configured to meet your needs for today and tomorrow. Each of these steps are reviewed in Chapter 3, "Capacity Sizing."

Historical information obtained from NT Servers running similar applications and experiencing similar workloads are an excellent source of sizing information. In lieu of having historical information available, review Chapter 3's Capacity Sizing Industry Standard Benchmark section. This section focuses on how to extrapolate from Industry Standard Benchmarks when initially sizing your NT Server solution.

Some of the NT Server specifics sizing concerns that I commonly encountered are presented below.

Sizing Disk Subsystems

The primary areas to consider when sizing the disk subsystems are capacity, availability, and performance. Consider each of these factors jointly when configuring your I/O subsystem.

Disk Subsystem Capacity

Capacity is the initial concern for the disk subsystem, but too often its relationship to availability and performance are not taken into consideration. Capacity is the easiest resource to calculate when using the sizing methodology. For this example, consider a database that requires 27 GB of actual (usable) disk space for all of its components (logs, dataspace, userspace, index areas, etc.). Three 9-GB disk drives are an economical choice to meet this capacity requirement.

Disk Subsystem Availability

The availability requirement indicates the data is important enough to need some type of data protection, while keeping cost of the server in the realm of reality. When selecting which RAID strategy to meet your availability needs, refer to the RAID definitions earlier in this chapter (Appropriate Use of RAID) to determine which RAID configuration provides the availability needed. RAID arrays that provide fault tolerance protection do so at a cost of disk capacity. A RAID 5 array can provide adequate data protection against the loss of a single disk drive, but at the disk capacity cost of 1/(size of one of the member disks in the RAID 5 array). For example, a four 9-GB RAID 5 disk array yields 27 GB of usable data space.

Sizing Disk Subsystem Performance

General

The required transaction rate is very important when determining your physical and logical disk I/O layout. The transaction rate refers to the number of transfers/sec or input/output operations (I/O's) per second that the server must provide to meet the desired disk subsystem performance requirement. Disk drives from the same vendor's disk family, regardless of the disk capacity, provide similar levels of performance. A 9-GB disk drive provides more disk capacity than a 4-GB disk drive, but it does not necessarily provide more performance.

A 7200 RPM Fast and Wide SCSI-2 based disk drive can typically sustain up to 100 I/Os per second in a mixed workload environment and 2 Mbytes/sec transfer rates. When grouped together into a RAID array, the array typically supports the aggregate of these performance factors. For example, a three disk RAID 0 array can support approximately 300 I/Os per second and 6 Megabytes/sec of throughput. This is where the old saying that more spindles (disk drives) provide better performance comes from. These specifications are rules of thumb used for basic planning. The actual performance will vary significantly based on client loads and application design.

Disk Subsystem Selection

Using these rules of thumb outlined above, an initial estimate can be made to determine if a specific disk configuration will support your environment. Although it is more economical to use the largest capacity disk drives available, this strategy may not provide the performance required. Not taking this information into account when configuring your disk I/O subsystem is the most common mistake I have encountered!

For example, if you were developing a new NT Server solution to estimate the I/Os per second it must support, review historical performance metrics from Perfmon for a similar environment. Specifically, check Perfmon's Logical Disk: Transfers/sec counter. This counter provides a good indication of I/Os per second the disk subsystem is experiencing. If the similar server is satisfactorily supporting 100 users across a four 4-GB RAID 5 disk array and the transfers/sec counter indicates a value of 200, a new server's four 9-GB disk array should support similar workload needs. However, if the new server will need to support an additional 100 users with similar work habits, the four 9-GB disk array will probably not support your needs. To support the estimated 400 I/Os per second for your environment, additional disk drives will need to be configured into the four 9-GB disk array or a second disk array will need to be configured. More detailed information on determining transfers/sec for RAID arrays is investigated in Chapter 6, "NT Server and Disk Sysbsystem Performance."

SCSI Bus Implementation

When considering how to configure your SCSI buses, review Table 1.7.

Table 1.7 Various SCSI bus characteristics.

Technology	Theoretical Transfer Speed Mbytes/sec	Realistic Transfer Speed (Estimated Real world performance) Mbytes/sec
Ultra Fast/Wide SCSI-3	40	32
Fast/Wide SCSI-2	20	16
SCSI-2	10	8

Avoid configuring fast SCSI devices (disk drives) on the same SCSI bus as slow devices (tapes, CD-ROMs), this can potentially slow down access to your faster devices if all devices are active on the bus at the same time.

Determining the Number of Disk Drives per SCSI Bus

The number of disk devices (disk drives or arrays) to configure per SCSI bus is a function of your environment. To determine the throughput of your disk devices, there are two primary options. First, you could use various third party testing tools outlined in Chapter 2. Secondly, you could use Perfmon. To use Perfmon when determining the throughput of each disk device, explore Perfmon's logical disk objects: bytes/sec. Then use the estimated SCSI Real World transfer speed chart when configuring the number of disk drives per SCSI bus. For example, if you notice that each of your disk devices is providing 4 Mbytes/sec, and you have four devices, your SCSI channel must support 16 Mbytes/sec. If these four disk devices are connected to a Fast/Wide SCSI-2 bus, which (from the chart) will support 16 MB/sec, configure any additional disk devices on a second SCSI bus.

If you do not have any historical data for your environment, a good rule of thumb when configuring the number of disk drives per SCSI bus is to estimate that each disk drive will provide 2 Mbytes/sec of data throughput. Remember that the throughput actually achieved is highly dependent on client load, application design, disk adapter model, and disk drive model.

Sizing the CPU(s)

The primary object when sizing the model and number of CPUs is related directly to the type of workload in the projected environment. The typical rule of thumb here is to configure the fastest CPU that is economically feasible and is supported by the application you wish to run. Business requirements and required resources to support NT Server based applications are increasing constantly. Starting with the fastest possible CPU and a server architecture that can support multiple CPUs provides better long-term investment protection.

NT Server is a multiprocess and multithread enabled operating system. Without any real applications running under NT, there are multiple active processes and threads. Thus, additional CPUs are normally helpful to a degree. Consider your NT Server a candidate for a multiple CPU configuration if the following conditions are relevant to your environment:

More than one major application running at a time
The application is multithreaded and was designed for a Symmetric Multi Processing (SMP) environment
Utilizing the Tuning Methodology, you have determined that the server is CPU bound and the Server Work Queues: Queue Length is greater than two times the number of CPUs on the server.

Configure enough CPU resources to drive all of the server's resources. For example, configuring a single CPU server that is connected to three ATM networks and 200 GB of disk drives in a high transaction rate environment is an unbalanced server and configured in an unrealistic manor. Develop the server configuration such that the CPU is kept "fed." This is achieved by ensuring there is enough CPU cache, Main Memory (RAM), and I/O channels for the disk and network devices. NT Server can easily be slowed by its weakest component in the data path that is not providing the data requested by the CPU. More information on factors that influence CPU perrformance is located in Chapter 4, "NT Server and CPU Performance."

Sizing Network I/O Subsystems

NT Server requires good network performance, as the network is the primary means in which clients request services. This topic is reviewed in depth in Chapter 7, "NT Server and Network Performance."

Network Selection

A common misconception when configuring the network I/O subsystem is a misunderstanding of actual throughput that the network can provide and the network characteristics needed to meet the requirements (see Table 1.8).

Table 1.8 Common Network Information.

LAN Network Technology	Throughput Megabits/sec	Throughput Megabytes/sec	Topology Media Access (Protocol Characteristics)
Ethernet (10BaseT)	10	1.25	CSMA/CD
Token Ring (16MB Ring)	16	2	Token Passing
Ethernet (100BaseT)	100	12.5	CSMA/CD
FDDI	100	12.5	Token Passing

Note that there are two key pieces of information presented above to consider when configuring the server's network I/O. First, the theoretical throughputs of all of the various networks are typically referred to in bits/sec, not bytes/sec. Mixing up bits/sec and bytes/sec is surprisingly common. When determining the aggregate network requirements of the clients the server is supporting, always keep this in mind.

Second, consider the Topology–Media Access method of each network type. Each network type displays different response time characteristics under higher utilization levels. In an Ethernet environment, as the average network utilization increases above 20 percent, a Network General Expert Sniffer will generate an event that the network usage is becoming a potential bottleneck. At which % utilization Ethernet performance degrades depends on your environment and the published reference you use. When Ethernet utilization rises above the 50-70 percent range, response times increase dramatically due to the associated network congestion.

The Number of Clients per Network

A simplified rule of thumb for the number of clients to connect per network segment is to determine what is the worst case acceptable throughput for each network client, then divide that amount into the selected network throughput. For example, if each client should have no less then 1.5 Mbits/sec of available bandwidth, and the network supports 100 Mbits/sec, the segment could possibly support 66 clients. Unfortunately this simple calculation does not take into account the network media characteristics, but it does provide for a good starting point for the maximum amount of clients per network segment. Utilizing historical and industry standard benchmarks greatly aid in sizing the number of clients per network segment.

Server Network Interface Card Selections

The type and number of network cards is obviously dependent on the network architecture in which the server will be deployed. As before, there are two techniques to obtain the information needed to determine the amount network bandwidth the server requires.

If historical performance information is not available, survey your environment to determine required bandwidth and the subsequent number network interface cards. For example, suppose you are sizing a Web server and have a goal of supporting 50,000 hits per eight-hour workday. The server network interface must be able to sustain 139 Kbits/sec of throughput. For this example, one 10 BaseT Ethernet card would easily support the network bandwidth requirement. If this web server is destined to deliver content over the Internet, call your internet service provider (ISP). I'll let your local ISP help you configure the WAN connection to the Internet for right now. In Chapter 8, a web server connected to an Intranet/Internet over LAN connections is examined.

The second method to determine the bandwidth needed for the server to support is by reviewing the Perfmon Network Interface: Bytes Total/sec for clients in their current environment. Summing the bytes/sec average values up for the total number of clients you have can provide you with a network sizing reference point.

Sizing Memory Requirements

Vendors try their best to provide realistic information on memory requirements for servers that run their software applications. Unfortunately, they are trying to sell a product as well, thus their estimates for server resources tend to be on the skinny side, to say the least. But for a starting point, be sure you take into consideration the following:

all applications that the server is running
NT Server's requirement
NT Server File System Cache use
the number of concurrent users the server is supporting
the amount of server resources configured (CPU, Disk, Network etc.)

To determine a good memory size starting point, sum the vendor recommendations for all of the above areas. For example, if you were implementing a database and web server for 60 concurrent users, consider:

clients performance expectation. For most environments, larger amounts of RAM will yield higher levels of overall performance
all applications that the server is running.
Database Application—16 MB Web Server—16MB
NT Server's Requirement
32 MB
NT Server File System Cache
32MB
The number of concurrent users the server is supporting
60 * 250 Kbytes of RAM for each connection

Summing the metrics together suggests a starting memory size for the server of 111 Mbytes. 111 Mbytes is not quite a power of two and it would be difficult to configure a Server with this exact amount of RAM, thus use a 128 Mbyte RAM server configuration as the reference point. The best place to find the information on the amount of RAM required by an application is often from the software vendor directly. Remember however, that if a software vendor provided default RAM requirements that were too large they might scare customers away.

If you wish to consider an alternate view of sizing a server's memory, check out the Server Memory Assessment Engine on-line tool from Kingston Technology at https://www.kingston.com/literature/pdf_files/mkF_413_servertesting.pdf. This on-line tool walks you through a questionnaire with a goal of making a memory sizing recommendation at the end. It is a helpful tool for a second or third opinion, but as before, keep in mind that Kingston is a company that specializes in selling RAM. Do not be surprised by the results. Review Chapter 3, "Capacity Sizing" for techniques to determine if your server configuration will really meet your objectives.

At a minimum, regardless of the application environment, consider the following CPU/Memory ratios when configuring your server's memory:

Table 1.9 CPU/Memory ratios.

Number of CPUs	Minimum Amount of RAM (Mbytes)
1	64
2	128
4	256
6	512
8	1024 (1GB)

This will help to insure that the CPUs are kept properly fed with data. This table is based on Intel Pentium PRO class processor performance levels or above.

Implementing Server Memory

When obtaining memory for your server make every effort to purchase the highest density available which still provides memory interleaving. Obtaining higher density memory provides a better utilization of available memory slots for future memory expansion. Of course, if you think that the amount of RAM required by operating systems and software applications will actual decrease in the future, skip the previous rule of thumb.

Memory interleaving is typically available in two flavors, two-way and four-way interleaving. Use four-way interleaving whenever possible, since it significantly improves the overall performance of your RAM-based memory resource. When finally choosing the type of memory for the server, select the fastest available RAM that employs physical error checking and correcting functionality. Specific RAM chip(s) speed is referenced by the amount of time it takes the CPU to access the RAM. For example, choose 60 ns RAM over 70 ns RAM. When making this selection, ensure that the RAM technology in use is not proprietary. Using proprietary RAM chips can cause acquisition and pricing challenges later in the server's life cycle.

Summary

In this chapter, tuning and sizing NT Server was reviewed from the perspective of immediate actions that can be completed to improve the overall performance of your NT Server solution. Although not all encompassing, this chapter provides a reference to the concepts and methodologies presented throughout the other chapters of this book. Some of these tuning suggestions may already be familiar while others may be new. By reviewing the subsequent chapters in this book, these tuning tips and rules of thumb are explored in more depth. This may lead you to reconsider some of the most common tuning tactics while introducing you to new tactics you may not have considered.

About the Author

Curt Aubley, formerly the Senior System Architect and MCSE for NCR, is Chief of Technology at OAO Corporation and author of many published articles on Windows NT.

We at Microsoft Corporation hope that the information in this work is valuable to you. Your use of the information contained in this work, however, is at your sole risk. All information in this work is provided "as -is", without any warranty, whether express or implied, of its accuracy, completeness, fitness for a particular purpose, title or non-infringement, and none of the third-party products or information mentioned in the work are authored, recommended, supported or guaranteed by Microsoft Corporation. Microsoft Corporation shall not be liable for any damages you may sustain by using this information, whether direct, indirect, special, incidental or consequential, even if it has been advised of the possibility of such damages. All prices for products mentioned in this document are subject to change without notice. International rights = English only.

International rights = English only.

Click to order