Troubleshooting Performance Problems
Applies To: Windows Server 2008, Windows Server 2008 R2
This topic contains information about how to analyze blockages during each phase of an image installation. For more information, see Optimizing Performance and Scalability.
In This Topic
Analyzing Blockages in Each Phase of Installation
Network Boot Phase
TFTP Download Phase
Image Apply Phase
Using Performance Monitoring
Analyzing Blockages in Each Phase of Installation
Network Boot Phase
The network boot phase encompasses the initial boot performed by the client computer. This includes obtaining an IP address lease, locating a Windows Deployment Services server, and downloading a network boot program by using Trivial File Transfer Protocol (TFTP). The amount of data transferred over the network during this phase is minimal, and the end-to-end operation typically succeeds in a matter of seconds.
Given the speed at which operations in this phase are completed, you have a few options when it comes to performance tuning. Windows Deployment Services can handle several hundred network boot requests per second in sustained throughput. Slight performance decreases can occur if the domain controller is located across a latent network link or is overloaded. In larger environments, consider locating Dynamic Host Configuration Protocol (DHCP) and Windows Deployment Services roles on separate physical computers. For more information about this option, see Configuring DHCP.
TFTP Download Phase
The TFTP download phase is when the boot image is downloaded to the client computer. Performance in this phase is tied directly to the following factors (in order of importance):
Latency between the client computer and the server. This is measured by the average response time between the server and the client.
Size of the boot image. For this reason, increasing boot image size will cause the TFTP download times to increase and will reduce reliability. Typically, the longer it takes to download the boot image, the more likely it is that something could go wrong.
TFTP block size. The block size is the size of the data packets that are sent by the server to the client that is downloading the file (as discussed in RFC 2347). A larger block size allows the server to send fewer packets, so there are fewer round-trip delays between the server and the client. However, a large block sizes leads to fragmented packets, which most PXE client implementations do not support. To configure the block size, you must modify the Boot Configuration file on the client. For instructions, see How to Modify the BCD Store Using Bcdedit.
TFTP window size. TFTP requires an acknowledgment (ACK) packet for each block of data that is sent. The server does not send the next block in the sequence until it receives the ACK packet for the previous block. TFTP windowing is a feature in Windows Deployment Services that enables you to define how many data blocks it takes to fill a window. The server sends the data blocks back-to-back until the window is filled, and then the client sends an ACK packet. Increasing this window size reduces the number of round-trip delays between the client and server and decreases the overall time that is required to download a boot image. Similar to the block size, you must modify the Boot Configuration file on the client. For instructions, see How to Modify the BCD Store Using Bcdedit.
Other network conditions. The workload and the quality of your system hardware also affect the TFTP download performance.
Diagnosing TFTP Download Performance Problems
The simplest way to diagnose long download times (observed from the client computer as a progress bar below an IP address) is to look at the average response time between the client and the server. To do this, in Windows PE, open the Command Prompt window, run ping <server’s IP address>, and then note the average latency. The output will look similar to the following, where the average latency is less than 1 millisecond (which is good):
C:\Windows\system32>ping 10.197.160.93 Pinging 10.197.160.93 with 32 bytes of data: Reply from 10.197.160.93: bytes=32 time=2ms TTL=60 Reply from 10.197.160.93: bytes=32 time<1ms TTL=60 Reply from 10.197.160.93: bytes=32 time<1ms TTL=60 Reply from 10.197.160.93: bytes=32 time<1ms TTL=60 Ping statistics for 10.197.160.93: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 0ms, Maximum = 2ms, Average = 0ms
High round-trip times indicate latency on the network, which is an indicator that TFTP download performance will be poor. To improve this performance, consider doing one or more of the following:
Use a Windows Deployment Services server that is closer to each client.
Remove stress and load from the network segment.
If the client connects to the server after multiple network hops, use the output from the tracert command to identify the latent segment, and consider rerouting TFTP traffic to avoid the hop.
You can also diagnose TFTP download performance problems by examining a network trace of the download activity. Generally, the best practice is to obtain this trace from the client and server simultaneously to assess exactly where the blockage is occurring (server, client, or network). To do this, add a client and a third computer to a hub, start network traces from the server and the third computer, and then boot the client computer from the network.
Addressing TFTP Download Performance Problems
In the preceding example, the average latency is less than 1 millisecond, which is good. If the average latency between the client and the server is longer than 5 milliseconds, TFTP performance will be seriously degraded. You may be able to decrease the impact of latency on TFTP download times by increasing the TFTP block size. This means that more data will be sent each time, which cuts down on the number of round-trips. For instructions, see How to Modify the BCD Store Using Bcdedit.
Reducing the size of the boot image can also speed up TFTP downloads. To accomplish this, do the following:
Ensure that the Windows image (.wim) file that contains the boot image does not contain extra space. A best practice is to use the ImageX /export command to export your boot image to a "clean" .wim file before adding the image to the Windows Deployment Services server.
Ensure that the .wim file that contains the boot image is using the maximum compression format, LZX. To do this, run Imagex /info ImageFile <ImageNumber|ImageName>.
In situations where a server is overburdened, you can configure a network boot referral to direct booting clients to different Windows Deployment Server servers for TFTP downloads. For more information, see Managing Network Boot Programs.
Alter your physical network topology by doing one or more of the following:
Add a Windows Deployment Server server closer to the client computer or more the client computer closer to the Windows Deployment Server server.
Repair the existing network infrastructure (in the case of high-packet loss).
Upgrade to better cabling (Cat 5e is recommended).
Upgrade to higher-quality routing hardware.
Check the condition of the switches between the client computer and the Windows Deployment Server server to ensure that packets are not being dropped.
Image Apply Phase
The image apply phase of the installation process involves transferring an install image from the Windows Deployment Services server to the client. This transfer occurs through either Server Message Block (SMB) or multicasting and is the most time-consuming part of the installation.
Diagnosing Performance Problems in the Image Apply Phase
To begin, test several client computers on your network, and compare the performance with the test results outlined in the "Performance and Scalability Expectations" section in Optimizing Performance and Scalability. You can also enable logging to gather information. For more information, see Logging and Tracing. If there are substantial variances between the expected results and your results, you probably have a performance blockage. To troubleshoot common blockages, ask yourself the following questions:
Do performance problems occur only at certain times of the day? This may indicate a scalability problem that is probably caused by an overused network or an overburdened server.
Do performance problems occur only for clients on a particular subnet or network location? If so, determine whether there is a network issue on that segment.
Do performance problems occur only for clients that access a particular server? If so, check the server’s performance statistics as well as the network segment that connects the clients to the server to see whether the server is overused.
Performance problems that occur across a larger group of computers generally indicate either a concurrency problem (scalability) or a blockage in the network or server. To investigate, measure the amount of time it takes to download a file (of approximately the same size as the install image) from the server to the client, in Windows PE. Or try to download the install image after it has been placed in a shared folder on the server. If the time it takes to download a large file exceeds the expectations, you should analyze the switch utilization and observe other network metrics to identify the network conditions that are impacting download times.
If you suspect that the server is the blockage, use the steps in the Using Performance Monitoring section later in this chapter to identify the root cause of the blockage.
Addressing Performance Problems in the Image Apply Phase
Performance problems in this phase are generally caused by network congestion, or inadequate resources on the server or client. If network congestion is the issue, consider doing the following:
Creating more bandwidth on the network. This may mean upgrading your network infrastructure to support greater bandwidth and higher throughput. For example, it might mean moving from 100 Mbps to 1 Gbps, upgrading cabling, replacing hubs with routers or switches, or reducing the number of clients that can access a particular network segment simultaneously.
Adding additional Windows Deployment Services servers to the network to handle the network demand. This means segmenting network infrastructure so that smaller groups of clients are answered by each server.
Balancing the server load by adding dedicated image servers. For more information, see Storing and Replicating Images Using DFS.
Reducing image size. Because larger images mean longer installation times and greater network strain, you should consider creating images that contain minimum customization, drivers, and applications; or consider creating specialized images for each department, hardware type, or function.
Use multicast. If multiple clients are downloading the same image at the same time, multicast can dramatically improve performance. For the best results, make sure that your switching hardware supports Internet Group Management Protocol (IGMP) snooping and that it has high backplane bandwidth capacity. For more information, see Performing Multicast Deployments.
Most Windows Deployment Services server blockages occur because of inadequate bandwidth (at the network adapter), slow disk subsystems, or insufficient physical memory. To identify the source of the blockage, use the information in the next section, Using Performance Monitoring. Typical causes on individual client computers include the following:
Problems with the physical network connection between the client computer and the network topology
Problems with the switching equipment
A bad disk controller interface on the client computer
A bad network adapter on the client computer
Insufficient RAM on the client computer (512 MB of RAM is the minimum requirement for Windows Vista)
Poorly performing system drivers
Client computers that are in a sleep power state on the same switch as clients that are actively downloading files over multicast. If a computer is in a sleep state, the Windows operating system reduces the speed of the network connection to 10 Mbps to save power. Some switching hardware that does not support IGMP uses broadcast instead of multicast, and it broadcasts at the speed of the slowest computer on the switch. Therefore, having an active multicast client and a sleeping client on the same switch can cause a severe performance problem for multicast. To prevent this problem, make sure that your switching hardware supports IGMP snooping, or ensure that the clients on the switch will not go to into a sleep power state.
Using Performance Monitoring
You can use Windows Reliability and Performance Monitor to diagnose performance problems with Windows Deployment Services. Note, however, that this is not a complete solution. Because most performance and scalability issues in Windows Deployment Services are network related, network analysis tools may be of greater use. Nevertheless, Windows Reliability and Performance Monitor can be a powerful and quick tool for identifying resource issues on services associated with Windows Deployment Services.
The following are the most useful counters for diagnosing Windows Deployment Services performance. To open Reliability and Performance Monitor, click Start, type Performance in the Start Search box, and then press ENTER. To add these counters, expand Monitoring Tools , click Performance Monitor, and then click the green plus sign (+) in the right pane. In Available Counters, scroll to the counter you want to add, and then click Add. Review the following information to maximize your server's performance.
Network Interface (Bytes Sent/sec)
PhysicalDisk (Avg. Disk sec/Read, Avg. Disk sec/Write, and Current Disk Queue Length ). These disk counters highlight the current disk activity. The Avg. Disk sec/Read and the Avg. Disk sec/Write counter should generally take less than 10 milliseconds, and the maximum should not exceed 50 milliseconds. Anything outside these thresholds indicates that there is too little available disk space to respond to the demands that are being placed on the server. The Current Disk Queue Length counter indicates the backlog of pending input/output (I/O) requests. As you might expect, you do not want to see much here, if anything.
Process (Page Faults/sec). Page faults occur when there is not enough physical memory on the server to meet the server's demands. When this occurs, the server has to copy memory from the physical RAM to a swap file on the hard disk drive, and then make room to enable the requested memory allocation to complete. This is a very expensive operation because this swap requires a series of reads and writes on the hard disk drive, and this process must be completed before the operation that caused the fault can resume. On servers where there is not enough memory, page faults can occur frequently, which significantly reduces the amount of processor time that is available to complete any other operations. If there are significant time periods with a lot of page fault activity, you should consider adding memory to the server.
Processor (% Processor Time). You can tell from the % Processor Time counter whether there is enough processing power on the server to meet the demands being placed on it. If you see that processor utilization is high, use this counter for each individual process to determine the cause of the degraded performance. If the Windows Deployment Services server is configured to work with File Replication Service (FRS), and the Distributed File System Replication (DFSR) service is consuming a significant portion of processor time, you should consider increasing the boot configuration data (BCD) refresh interval to reduce the number of changes that FRS has to propagate between servers. If the server has multiple server roles, you may want to configure the roles so that they are better distributed across multiple servers.
A strong correlation between network utilization and disk reads (and disk throughput) indicates that the network card may be the cause of a reduction in image deployment times. In this case, if you are not concerned with disk throughput, consider upgrading the network infrastructure to support GB Ethernet, or refactoring the Windows Deployment Services server infrastructure so that it is spread across multiple servers.
WDS Multicast Server (all counters). The following list describes all of counters for multicasting.
Active Clients. This counter shows the clients that are currently connected to a multicast session.
Active Contents. This counter refers to the data that is being transmitted. When a client connects to a multicast transmission, a “content” is created. The content is then removed if clients are not active for 5 minutes or longer. You can have multiple contents for a single transmission if there are multiple network cards on the server.
Active Namespaces. This counter is essentially equivalent to a multicast transmission. A namespace is the underlying object that gets created when you create a multicast transmission.
Incoming Packets/Second (in Bytes). This counter shows the sum of all incoming data packets (per second) from all multicast transmission.
Outgoing Packets/Second (in Bytes): This counter shows the sum of all outgoing data packets (per second) from all multicast transmissions. On a 100 Mbps network, you should expect this number to reach around 12.5 Mbps (for example, 13107200). On a 1 Gbps network, you should expect this number to reach 20 Mbps or more.
Total Data Packets. This counter shows the total number of data packets sent by the server.
Total Master Client Switches. This counter shows the total number of times that the master client has been changed in a transmission. Note that the master client is the slowest client in a transmission — that is, the client that is not capable of installing any faster, whereas the other clients may be able to install at a faster rate. This counter should stay steady or increase very slowly. If you notice this counter increasing regularly, it indicates that there are several clients limiting the performance of the multicast transmission.
Total NACK Packets. A NACK packet is a negative acknowledgement. This counter shows the total number of NACK packets received from client computers. This counter is often important for diagnosing problems with multicast performance. This counter should stay steady or increase very slowly. If you notice that this counter increasing at a rate of one or more per second, it indicates a performance problem.
Total Repair Packets. This counter shows the total number of repair packets sent by the server. Note that the server sends repair packets in response to NACK packets. If the number in this counter is high, relative to the Total Data Packets counter, this indicates that packet loss is occurring between the clients and the server. Ideally, the ratio of total data packets to total repair packets should be greater than 100:1.
Total Slowdown Request. Clients send a request when the server is sending data faster than the client can handle it. This is usually caused by slow disk performance on the clients, or by other resource pressure (such as insufficient memory, high CPU utilization, and so on).
WDS TFTP Server (all counters). The following list describes the two counters for TFTP.
Active Requests. This counter shows the number of active TFTP transfers on the server.
Transfer Rate/Second (in Bytes). This counter shows the total amount of data that Windows Deployment Services is sending out per second.
WDS Server (all counters). The following list describes the counters for the Windows Deployment Services server.
Active Requests. This counter shows the number of currently active requests on the Windows Deployment Services server, including remote procedure calls (RPCs) to the server and multicast requests.
Processed/Second. This counter shows the number of requests processed in the last second.
Requests/Second. This counter shows the number of requests received in the last second.
For more information, see Reliability and Performance Monitor.
For information about how to view these counters, see the following Microsoft TechNet articles: