Monitoring and Troubleshooting Performance
As an edge server connecting many networks, Microsoft® Internet Security and Acceleration (ISA) Server 2004 handles large amounts of traffic compared to other servers in an organization. For this reason, it is built for high performance. Yet, there are circumstances where high performance is not realized due to misconfiguration or inadequate resources. This article provides information about how to monitor and troubleshoot perceived performance problems.
Diagnosing Performance Problems
Solving Performance Problems
Monitoring Performance
Resource Counters
Diagnosing Performance Problems
In some deployment situations, the performance of ISA Server may be perceived to be slow or unacceptable. The end-to-end user experience that is measured in terms of the total time required to complete a transaction, load a Web page, or download a file is unsatisfactory.
This could be the result of:
- A resource on the ISA Server computer that is inadequate in capacity and therefore becomes a performance bottleneck.
- A deployment problem in the interaction of ISA Server with other components in the system or network.
- A security incident such as a denial of service (DoS) attack consuming a resource to its full capacity.
- In most cases, performance problems can be solved with proper diagnosis based on:
- Accurate definition and measurement of the performance problem (the symptom).
- ISA Server monitoring and system performance counters.
- ISA Server log events.
- Network captures.
Sources of Performance Problems
Performance problems can be based on inadequate capacity, deployment problems, or security incidents. The following sections describe these situations in detail, and how to diagnose each situation using available tools.
Inadequate Capacity
ISA Server capacity depends on CPU, memory, network, and disk hardware resources. Each resource has a capacity limit, and as long as all resources are consumed below their limit, the server as a whole functions properly, fulfilling its performance objectives. Performance drops considerably when one of these limits is reached, causing a bottleneck. Each bottleneck has symptoms that can help detect the resource that has inadequate capacity.
Solving a capacity problem is achieved by tuning the hardware or adding more of the resource that is inadequate. For more information about ISA Server performance tuning, see the document ISA Server 2004 Performance Best Practices.
A resource bottleneck is not always an indication of a capacity problem. When the problem is not a result of inadequate capacity, adding or tuning the hardware may not help solve the problem.
Deployment Problems
As a central network infrastructure component, ISA Server has many interactions with other network components, servers, and applications. The following are some examples:
- To enforce policy, ISA Server must communicate with other types of servers in various situations. If a Domain Name System (DNS) server or an authentication server fails to respond in a timely manner, ISA Server will fail to respond quickly on those requests that require server intervention.
- ISA Server is commonly connected to many physical networks. Improper configuration of these networks may lead to various performance problems, such as Maximum Transfer Unit (MTU) size conversions, packet loss, and retransmissions.
- Serving as a secure traffic pipe, ISA Server interacts with numerous server applications, such as Web, mail, and media. Improper configuration of these servers and the services they execute may lead to performance problems, such as a high Transmission Control Protocol (TCP) connection rate or limited packet rate.
The first step in solving a deployment problem is to identify the component that is causing poor performance.
Security Incidents
ISA Server implements numerous methods to defend against many network attack vectors. Nevertheless, as an edge server and a central network infrastructure component, it is the first target for attack attempts. Securing at the application level requires ISA Server to manage compound protocol states that have larger resource consumption as compared to lower levels of filtering, such as packet and transport layers.
It may be possible that a performance problem surfaces as a result of an attack. The most common attacks that consume system resources are denial of service (DoS) attacks and distributed DoS attacks (DDoS). The sources of these attacks can be hackers on the Internet accessing through the External network, as well as clients on Internal networks that are infected by viruses and worms that are trying to propagate or attack network resources.
Step-by-Step Procedure
When you encounter a performance problem, use the following procedure to diagnose the problem:
- Accurately define the problem in measurable terms. An accurate definition leads to a simple measurement. For example, loading a page from a published Web site takes 30 seconds.
- Examine resource performance counters and record those that have suspect levels (for example, more than 75 percent total processor utilization). For more information, see Resource Counters in this document. If a resource is at a suspect level, it may be a capacity problem. Go to step 4.
- If all resources are within normal operation limits, it is probably a deployment problem. Search the ISA Server log and the operating system event log, and record any suspicious events. If logs do not show the problem, create a network capture on all network interfaces. Search for large time gaps between a request and a response. For a list of most common cases, see Solving Deployment Problems in this document.
- If at least one resource is at full capacity, search the ISA Server log and record any suspicious events that may indicate a DoS or DDoS attack. To identify possible known attacks and ways to mitigate them, see Solving Security Problems in this document. If there is no sign of a security incident, continue with step 5.
- If at least one resource is at full capacity, to identify possible deployment problems that correlate with high resource consumption, see Resource Intensive Deployment Problems in this document. If no deployment problem is identified, continue with step 6.
- If some resources are at full capacity, and require adjustment and tuning to identify and solve the problem, see Solving Capacity Problems in this document.
Measurement Problems
There are situations in which ISA Server seems to be performing poorly as compared to other deployment options. This happens when using improper measurement methods and metrics to compare the response time of a system that has an ISA Server computer between a client computer and a target application server to a system that has no ISA Server computer deployed.
Web Publishing Performance
One example of measurement problems occurs when attempting to measure the throughput capacity in bits per second (bps) or requests per second of an ISA Server computer using a fixed number of best-effort simulated Web clients hitting a published Web server. These clients work independently, sending the next HTTP request immediately upon receiving the previous response. This method is common in many load generation tools such as Microsoft Application Center Test (ACT).
The results of this test show that a Web server published by an ISA Server computer has less throughput than hitting the Web server directly. This is a true outcome but not a true statement. The problem with this method is that it does not measure throughput but rather measures round-trip time (RTT), because the number of bps that the client can generate is dependant on the average RTT for each HTTP transaction:
bits/sec = average_bits/transaction * average_transactions/sec =
average_bits/transaction / average_RTT
The average RTT is expected to increase in this scenario when deploying an ISA Server computer, as compared to hitting the Web server directly. This is because the ISA Server computer acts as another network hop between the clients and the Web server. Every hop has a penalty on RTT, especially if the TCP stack is fully traversed upon entrance and exit of each packet through the application-layer Web Proxy filter.
A true metric that should be measured for ISA Server throughput is bps at 80 percent CPU utilization. Because bps per simulation agent decreases as a result of an increase in average RTT, you need to add more simulation agents to reach the full processor capacity of the ISA Server computer.
To summarize, ISA Server is not expected to improve the response time in Web publishing scenarios. With properly tuned caching and Web content, it will enable considerable offloading from the back-end Web server, allowing for Web server consolidation.
Forward Web Proxy Response Time
The same situation as described in the previous section arises when measuring Internet access speed with speed measurement sites. These Web tools measure round-trip time (RTT) of HTTP requests sent to many popular Web sites around the world bypassing all the Web caches on the way. Similar to Microsoft ACT, it does so with best-effort clients. As with Web publishing, ISA Server adds several milliseconds to the RTT, which in most cases is unnoticeable because the majority of RTT is spent over the Internet and Web server. But due to the best-effort behavior of these clients, an expected increase in RTT is mistakenly considered as lower throughput.
Solving Performance Problems
The following sections describe various performance problems, their symptoms, and how to solve them. These sections are divided into the following categories:
- Deployment problems. Caused by various misconfigurations in ISA Server or the network infrastructure. These problems are characterized by limited throughput, slow response, and low CPU utilization.
- Resource intensive deployment problems. These are more difficult to identify because they occur and amplify intensive CPU utilization conditions.
- Security problems. Caused by DoS or DDoS attacks.
- Capacity problems. Where some resource is at full capacity and requires tuning, or the capacity needs to be increased.
Solving Deployment Problems
Solving deployment problems involves domain name resolution, domain controllers, TCP Nagle algorithms and delayed acknowledgements, and network problems.
Domain Name Resolution
ISA Server requires Domain Name System (DNS) for various name resolutions. For example, when receiving an HTTP request with a host name that is an IP address, ISA Server must perform a reverse DNS lookup to get the domain name of this IP address, because it could be blocked by some URL set.
When DNS does not respond in a timely manner, worker threads will be blocked on pending DNS responses, and the number of backlogged packets will consequently increase. The symptom is characterized by:
- \ISA Server Firewall Packet Engine\Backlogged Packets > 10
- \ISA Server Firewall Service\Worker Threads > 100
- Network captures show gaps of several seconds between DNS queries and their responses.
There are various ways to solve the problem depending on its nature. For more information, see:
- ISA Server Site and Content Rules Are Not Enforced for HTTP Content, at Microsoft Help and Support.
- Only the First Web Site Is Returned Using Web Publishing for Multiple Sites, at Microsoft Help and Support.
Domain Controller
ISA Server interacts with the domain controller in various authentication configurations. When the domain controller does not respond in a timely manner, worker threads will be blocked on pending authentication requests, and the number of backlogged packets will constantly increase. The symptom is characterized by:
- \ISA Server Firewall Packet Engine\Backlogged Packets > 10
- \ISA Server Firewall Service\Worker Threads > 100
- Network captures show gaps of several seconds between authentication requests to and the domain controller responses.
To solve this problem, increase the number of concurrent pending authentication requests as described in HOW TO: Configure your ISA Server for a Very Large Number of Authentication Requests, at Microsoft Help and Support.
TCP Nagle and Delayed ACK
Transmission Control Protocol (TCP) combines various delays to optimize TCP performance by reducing small packet traffic. On the sending side, TCP uses the Nagle algorithm that delays outbound packets for interactive applications. On the receiving side, TCP uses delayed acknowledgements (ACK) to increase probability of piggybacking the ACK on data that is sent back. However, in some circumstances these two mechanisms can cause a fixed delay of 200 milliseconds resulting in five packets per second throughput. ISA Server does not contribute to this effect, but because of its role in the middle, ISA Server can mistakenly be blamed for it. The symptom can be seen in a network trace when sent packets are acknowledged after a 200 millisecond delay.
The solution to the problem is changing the behavior of the application. For details about how to avoid this issue in applications, see INFO: Design Issues - Sending Small Data Segments Over TCP w/Winsock, at Microsoft Help and Support. For overcoming common TCP performance problems, see Common Performance Issues in Network Applications Part 1: Interactive Applications, on the MSDN Web site.
Network Problems
Network problems affecting ISA Server performance include various hardware and configuration problems. Like routers, ISA Server is commonly deployed at network junctions, so the same types of networking problems that are encountered in router configurations are likely to be encountered in ISA Server deployments. In these cases, it is useful to refer to operating system deployment and troubleshooting guides.
Solving Resource Intensive Deployment Problems
Solving resource intensive deployment problems involves packet fragmentation, application and Web filters, and server-side HTTP Keep-Alive.
Packet Fragmentation
Packet fragmentation happens when two networks use a different Maximum Transfer Unit (MTU) size to transmit data. For example, a local area network (LAN) commonly uses an MTU of 1,460 bytes, and a wide area network (WAN) may use a smaller packet size (536 bytes). When packet fragmentation happens, the TCP stack on the ISA Server computer consumes many processor cycles in an effort to manage buffers and transfer data between buffers of various sizes.
The symptom can be seen in a network capture looking at the packet sizes on each network interface. In many cases, it is possible to tune MTU size. For a list of MTU sizes on different network media types, see Default MTU Size for Different Network Topology, at Microsoft Help and Support. For information about tuning MTU and other TCP registry values, see TCP/IP and NBT configuration parameters for Windows XP, at Microsoft Help and Support.
Application and Web Filters
ISA Server application filters and Web filters provide the application logic filtering extensions. Without these filters, ISA Server provides only stateful filtering with various security measures that are common to all applications (such as quota restrictions or access policy). As logic extensions to ISA Server, filters have a direct effect on ISA Server performance.
One way to measure the effect of a filter on the performance of ISA Server is to compare the CPU utilization of ISA Server under realistic load with the filter enabled and disabled. If the difference is not reasonable for the amount of computation that the filter performs, you have identified one possible cause to the performance degradation.
The filter developer should be notified about filter performance problems. In many cases, the diagnosis outlined will provide the required information to the filter developer to identify and fix the problem.
Server-Side HTTP Keep-Alive
Opening and closing TCP connections causes performance overhead for any network server. To lower this overhead, HTTP includes a Connection: Keep-Alive header that instructs both client and server sides to reuse a connection for many requests. ISA Server Web Proxy takes this optimization one step further because requests from many clients are routed through a single connection to the upstream Web server, thus increasing connection reuse. Yet, misconfiguration of upstream Web servers, especially in a Web publishing scenario, may cause every request to end with a connection close. This has a direct effect on performance that is difficult to see without network tracing.
The symptom can be seen in a network trace on the upstream network interface, when every connection is closed after a single request. Solving the problem starts with understanding which side closes the connection, and why it does so. If an upstream Web server closes all connections after sending a single response, check whether it is possible to configure it to reuse its connections.
If ISA Server closes the connection, it could be as a response to incompatible or ambiguous HTTP syntax.
Solving Security Problems
Security problems affecting ISA Server performance are DoS and DDoS attacks. These attacks are characterized by the full consumption of one or more resources of ISA Server. From a performance view, there is no difference between a capacity problem and a security problem, because in both cases the performance of ISA Server suffers due to a resource bottleneck. Still, there are many indications that can lead to a conclusion that the source of a performance problem is a security incident.
ISA Server uses various mechanisms to automatically detect and block security incidents that lead to DoS conditions:
- TCP SYN attacks. Automatic detection and protection.
- UDP or raw IP flood. Automatic detection and protection by use of per-rule connection quota.
- Virus or worm propagation. Automatic detection and protection by use of per-IP connection quota.
In these cases, alerts are triggered, enabling the ISA Server administrator to examine the nature and source of the attack, and use preventive measures to eliminate it.
Identifying a DoS or DDoS attack requires input from all monitoring sources:
- Performance counters show how much a resource is consumed, as well as other numbers that have suspect levels triggering further examination with other sources.
- ISA Server logs show irregular denial patterns that correlate with a set of ports or IP addresses that are denied access. In most cases, looking at the ISA Server logs provides the necessary information to identify and solve a security incident.
- Network captures can also show irregular traffic patterns but at the lower network level. Use network captures in cases where ISA Server logs do not provide adequate information.
When identifying a DoS security incident that is not automatically detected and blocked by ISA Server, contact Microsoft Help and Support.
Solving Capacity Problems
Capacity problems occur when a resource reaches full capacity. The following sections describe each resource, its capacity limit, and how to solve a capacity problem. Note that at high loads, more than one resource will reach full capacity. In this case, solve them one by one, according to the section order.
For best practices concerning performance tuning and capacity planning, see ISA 2004 Performance Best Practices.
Hard Page Faults
Hard page faults occur when processes compete for physical memory space. Whenever a process is running and allocated processor time, it uses memory. If there is not enough physical memory to hold all the virtual memory it requires while running, hard page faults are triggered by the operation system, causing physical memory pages to be swapped with virtual pages in a page file. Hard page faults cause unwanted input/output (I/O), resulting in enormous response delays. The symptom is easily identified by:
- Average \Memory\Pages/sec > 5 for over several minutes.
To solve a memory problem, do one or more of the following:
- Identify which processes are using this memory by looking at \Process(*)\Private Bytes, \Process(*)\Virtual Bytes and \Process(*)\Working Set, and consider disabling high memory consuming processes.
- If Web caching is enabled, consider reducing the memory Web cache size.
- Add more memory.
Network at Full Utilization
Network at full utilization means that the underlying bandwidth of a network link is fully saturated. The indication that this is the case is:
Average \Network Interface(*)\Bytes Received/sec or
\Network Interface(*)\Bytes Received/sec is more than 80% of \Network Interface(*)\Current Bandwidth
If the majority of this bandwidth is normal application traffic, solving this problem can be achieved as follows:
- For outbound Web proxy traffic, consider using a Web cache if not already deployed. This will lower bandwidth requirements on the upstream Web link, and will improve the average response time on the downstream links.
- Consider adding more bandwidth to the saturated network link. In most cases, this will mean increasing the bandwidth of the WAN Internet link.
Disk Transfers at Maximal Rate
Physical disk transfers are limited to approximately 100 random accesses per second. (This is true for disks spinning at 10,000 revolutions per minute (RPM). At 15,000 RPM, the limit is approximately 130 I/O per second). When attempting more than this limit, all responses that are dependent on I/O will suffer enormous delays.
A disk that has reached this limit will be characterized by:
Average \PhysicalDisk(*)\Disk Transfers/sec > 100 over several minutes for some disk
This can happen in the following cases on an ISA Server computer:
- Disk Web cache is not tuned to serve all the I/O generated by requests for content in the disk.
- Logging creates too much I/O for a single disk to handle. This may happen in extreme cases of high load when using Microsoft SQL Server™ 2000 Desktop Engine (MSDE) logging.
In the first case, the solution is to add another physical disk. For redundant array of independent disks (RAID) storage systems, it is important to select RAID-0 to be able to reach the maximal I/O on each disk.
In the second case, check whether logging can be disabled for some rules, lowering the total log entries to be written. Another option is to switch to text logging, which requires considerably less I/O.
Processor at Full Utilization
A processor at full utilization is characterized by high percentage of processor time. For ISA Server, the recommended maximum is 80 percent. When processor utilization is high for periods of several minutes, this could lead to longer response times, especially if there are peaks reaching 100 percent utilization:
Average \Processor(*)\% Processor Time > 80 over several minutes for some processor
\ISA Server Web Proxy\Average Milliseconds/request > 30,000
When this happens, it is important to record how this utilization is divided between user time (\Processor(*)\% User Time) and kernel time (\Processor(*)\% Privileged Time). In most scenarios, except for intensive content processing scenarios such as Secure Sockets Layer (SSL), user time will be less than kernel time. If this is the case, it is likely that the bottleneck is not caused by inadequate processor capacity. In this case, use \Process(*)\% Processor Time to determine if some other process is extensively using the CPU at the same time.
Before increasing the processing power of the ISA Server computer, consider the following:
- Check filters and filter configurations for performance intensive options that may be disabled or relaxed. For example, a Web filter performing virus scanning could be configured not to scan some content types, such as images or text files that are not harmful from a security view.
- Replace MSDE logging with text logging.
- Review policy and check whether it is possible to use stateful filtering instead of application filtering for traffic that is considered harmless.
Monitoring Performance
To maintain and manage the health of ISA Server, it is necessary to constantly monitor its performance. The following sections list resource counters and ISA Server counters that help troubleshoot ISA Server performance problems. We recommend monitoring these counters on a regular basis at a reasonable sampling rate of several samples per minute.
Resource Counters
The following table describes resource counters.
Performance counter | Description | Maximum, expect, or suspect | Recommended action |
---|---|---|---|
Processor |
|
|
|
\Processor(*)\% Processor Time |
Percent of time processor is utilized. |
Suspect if higher than 80% over several minutes. |
For information, see Processor at Full Utilization in this document. Verify if there are other processes using the CPU. If CPU correlates with \ISA Server Firewall Packet Engine\Packets/sec, high CPU may indicate maximal capacity or DoS attack. |
\Processor(*)\% User Time |
Same as processor time but counts only user-mode processor cycles. |
Depends on the scenario. Expect 20–30% of processor time in a user-mode scenario like Web Proxy. Suspect more than 70% of % Processor Time unless using SSL or VPN. |
High % User Time may indicate ISA Server misconfiguration. |
\Processor(*)\% Privileged Time |
Same as processor time but counts only kernel-mode processor cycles. |
Depends on the scenario. Expect most processor cycles to be privileged in a kernel-mode scenario such as stateful filtering with IP routing enabled. Suspect otherwise. |
|
\Processor(*)\% DPC Time |
Percent of time processor is running a DPC used to measure service interrupts. |
Suspect more than 40%. Check this counter when assigning processor affinity. |
|
Memory |
|
|
|
\Memory\Pages/sec |
Hard page fault rate. |
Suspect if more than a few. Best is 0. |
For information, see Hard Page Faults in this document. |
\Memory\Pool Nonpaged Bytes |
Nonpaged bytes used by kernel. |
Depends on amount of physical memory. For more information, see ISA Server 2004 Performance Best Practices (https://go.microsoft.com/fwlink/?LinkID=24514) |
|
Network |
|
|
|
\Network Interface(*)\Bytes Total/sec |
Total throughput. |
|
|
\Network Interface(*)\Packets/sec |
Same as bytes per second but in packets. |
Depends on packet size (MTU). Bytes Total/sec divided by Packet/sec indicates the actual average packet size. Suspect if it is less than 100 bytes. |
May indicate an attack. Trace network activity and look for irregular traffic patterns. If not an attack, check network for possible misconfiguration. |
\Network Interface(*)\Packets Received/sec |
Same as packets per second but counts only receive. |
Depends on the scenario. |
|
\Network Interface(*)\Packets Sent/sec |
Same as packets per second but counts only send. |
Depends on the scenario. |
|
\Network Interface(*)\Current Bandwidth |
What the interface can do. |
80–90% of this is the upper bound for throughput in each direction. |
|
\Network Interface(*)\Bytes Received/sec |
Same as bytes per second but counts only receive. |
Cannot be higher than 80–90% of bandwidth. |
|
\Network Interface(*)\Bytes Sent/sec |
Same as bytes per second but counts only send. |
Cannot be higher than 80–90% of bandwidth. |
|
\Network Interface(*)\Packets Received Discarded |
Received discarded packet rate. |
Suspect if not negligible. |
For information, see Network Problems in this document. |
\Network Interface(*)\Packets Received Errors |
Received error packet rate. |
Suspect if not negligible. |
For information, see Network Problems in this document. |
\Network Interface(*)\Packets Received Unknown |
Received unknown protocol packet rate. |
Suspect if not negligible. |
For information, see Network Problems in this document. |
\Network Interface(*)\Packets Sent Unicast/sec |
Sent unicast rate. |
Should be the majority (99%). Suspect otherwise. |
May indicate an attack. Trace network activity and look for irregular traffic patterns. If not an attack, check network for possible misconfiguration. |
\Network Interface(*)\Packets Sent Non-Unicast/sec |
Sent broadcast or multicast rate. |
Suspect if not negligible. |
May indicate an attack. Trace network activity and look for irregular traffic patterns. If not an attack, check network for possible misconfiguration. |
\Network Interface(*)\Packets Outbound Discarded |
Sent discard rate. |
Suspect if not negligible. |
|
\Network Interface(*)\Packets Outbound Errors |
Sent error rate. |
Suspect if not negligible. |
|
Disk |
|
|
|
\PhysicalDisk(*)\Disk Transfers/sec |
Disk I/O rate. Correlates with disk seek rate, and depends on Web cache hit ratio. |
A 10,000 RPM disk can do 100 maximum, and a 15,000 RPM disk can do 150 maximum. If a disk is used only for ISA Server Web caching and this counter is greater than the maximum, expect slow responses from ISA Server Web Proxy. |
If no other process is using this disk, add another physical disk and define a Web cache file on it. |
\PhysicalDisk(*)\Disk Reads/sec |
Same as transfers per second but counts only disk reads. For a Web caching disk, it is proportional to hit ratio (\ISA Server Web Proxy\Cache Hit Ratio*). |
For a Web caching disk, most transfers are reads. At normal hit ratios (20%–40%) expect more than 80% of disk transfers per second to be Disk Reads. Suspect if less. |
There is another process other than Wspsrv.exe writing to the disk. If disk transfers per second exceeds its maximum, identify this process (either by monitoring \Process(*)\I/O Write Operations/sec or using some other I/O tracing tool) and eliminate it. |
\PhysicalDisk(*)\Disk Writes/sec |
Same as transfers per second but counts only writes. |
For a Web caching disk, only a few transfers are writes. Suspect otherwise. |
There is another process other than Wspsrv.exe writing to the disk. If disk transfers per second exceeds its maximum, identify this process (either by monitoring \Process(*)\I/O Write Operations/sec or using some other I/O tracing tool) and eliminate it. |
\PhysicalDisk(*)\Disk Bytes/sec |
Byte rate, including memory and disk. |
Depends on hardware. Zero to tens of megabytes per second (MBps). |
|
\PhysicalDisk(*)\Disk Read Bytes/sec |
Same as bytes per second but counts only reads. |
For a Web caching disk, it is proportional to hit ratio and disk reads per second. Under normal forward/transparent caching conditions, expect (Read Bytes/sec) / (Reads/sec) to be up to 20 kilobytes (KB). Suspect otherwise. |
Verify whether there is another process reading from the disk. |
\PhysicalDisk(*)\Avg. Disk Bytes/Transfer |
Average number of bytes in each transfer. |
|
|
\PhysicalDisk(*)\Avg. Disk Bytes/Read |
Same as bytes per transfer but counts only reads. |
For a Web caching disk, it correlates with the average Web response size. Expect up to 20 KB and suspect otherwise. |
Verify whether there is another process reading from the disk. |
\PhysicalDisk(*)\Avg. Disk Bytes/Write |
Same as bytes per transfer but counts only writes. |
For a Web caching disk, it is much larger than Avg. Disk Bytes/Read because writes are gathered in batches of several responses. Expect 5–10 times more than Avg. Disk Bytes/Read. Suspect if about the same as Avg. Disk Bytes/Read, or if more than 10 times. |
Verify whether there is another process writing to the disk. If not, and Avg. Disk Bytes/Write is about the same as Avg. Disk Bytes/Read, this indicates that most of the data is not cacheable. Consider disabling it. |
\PhysicalDisk(*)\Disk Transfers/sec |
Disk I/O rate. Correlates with disk seek rate, and depends on Web cache hit ratio. |
A 10,000 RPM disk can do 100 maximum, and a 15,000 RPM disk can do 150 maximum. If a disk is used only for ISA Server Web caching, and this counter is greater than the maximum, expect slow responses from ISA Server Web Proxy. |
If no other process is using this disk, add another physical disk and define a Web cache file on it. |
ISA Server Counters
The following table describes ISA Server counters.
Performance counter | Description | Maximum, expect, or suspect | Recommended action |
---|---|---|---|
ISA Server Firewall Packet Engine |
|
|
|
\ISA Server Firewall Packet Engine\Active Connections |
Total number of active connections currently passing data. This includes TCP connections in TIME_WAIT state with 2MSL=60 seconds. |
Depends on the scenario. For application filtering scenarios, expect up to 30,000. Suspect if more. For stateful filtering with IP routing enabled, expect up to 100,000. Suspect if more. |
An increased tendency in slope may indicate a network misconfiguration. (RST packets are dropped by some router.) Or, may indicate a DoS attack. (TCP connections that are never closed with RST or FIN.) |
\ISA Server Firewall Packet Engine\Allowed Packets/sec |
Number of packets per second allowed to pass through the firewall. |
Directly impacts CPU utilization. The maximal value depends on the hardware and whether using stateful filtering (kernel-mode data pumping) or application filtering (user-mode data pumping). |
|
\ISA Server Firewall Packet Engine\Backlogged Packets |
Number of packets waiting for the firewall packet engine to create a data pump. |
Expect 0. Suspect if more than 10. |
At high \Processor(*)\% Processor Time, this indicates a maximal capacity condition. Otherwise, correlating with a large number of \ISA Server Firewall Service\Worker Thread, indicates that DNS or the Active Directory® directory service is responding slowly. |
\ISA Server Firewall Packet Engine\Bytes/sec |
Total throughput in bytes per second passing through the firewall. Every byte is counted twice: once when it enters the firewall, and once when it leaves the firewall. |
Bytes/sec divided by Packets/sec indicates the actual average packet size. Suspect if it is less than 100 bytes. |
May indicate an attack. Trace network activity and look for irregular traffic patterns. If not an attack, check network for possible misconfigurations. |
\ISA Server Firewall Packet Engine\Connections/sec |
Number of connections created per second (TCP and UDP). |
Directly impacts CPU utilization. The maximal value depends on the hardware and whether using stateful filtering (kernel-mode data pumping) or application filtering (user-mode data pumping). |
|
\ISA Server Firewall Packet Engine\Dropped Packets/sec |
Number of denied packets per second. |
Expect no more than 100. Suspect if more than 100. |
Indicates either a network misconfiguration or an attack. Use the ISA Server log to identify the actual condition. |
\ISA Server Firewall Packet Engine\Packets/sec |
Includes allowed and dropped packets. |
Directly impacts CPU utilization. The maximal value depends on the hardware and whether using stateful filtering (kernel-mode data pumping) or application filtering (user-mode data pumping). |
|
\ISA Server Firewall Packet Engine\TCP Established Connections/sec |
Number of TCP connections per second that successfully completed the 3-way SYN handshake. |
Suspect if less than 75% of Connections/sec. |
The difference between TCP Established Connections/sec and Connections/sec accounts for other protocols (UDP, ICMP, GRE or other raw IP protocols) and unfinished TCP SYN handshakes, indicating the possibility of a TCP SYN attack. |
ISA Firewall Service |
|
|
|
\ISA Server Firewall Service\Bytes Read/sec |
Throughput of read bytes. |
|
|
\ISA Server Firewall Service\Bytes Written/sec |
Throughput of written bytes. |
|
|
\ISA Server Firewall Service\TCP Bytes Transferred/sec by Kernel mode Data Pump |
Throughput of TCP data moved through kernel-mode data pumps. |
Compare to Bytes Read/sec and Bytes Written/sec. |
|
\ISA Server Firewall Service\UDP Bytes Transferred/sec by Kernel mode Data Pump |
Throughput of UDP data moved through kernel-mode data pumps. |
Compare to Bytes Read/sec and Bytes Written/sec. |
|
\ISA Server Firewall Service\Accepting TCP Connections |
Number of connection objects waiting for a TCP connection from Firewall clients after a successful remote connection. |
Expect no more than 10. Suspect if more. |
May indicate an attack from Firewall clients or congestion on the Internal network. |
\ISA Server Firewall Service\Worker Threads |
The number of Firewall service worker threads that are available waiting in the completion port queue. |
Maximum is 1,000. Expect 40–200. Suspect if more than 400. |
Large number of worker threads means that something is wrong with external services (DNS or Active Directory) or an attack is occurring. The number does not go down after it is raised. |
\ISA Server Firewall Service\DNS Cache Hits % |
Rate of DNS cache hits. |
Maximum is 100%. Expect 70%–90%. Suspect if greater than 30%. |
Check for possible DNS or network misconfiguration. May mean an attack where destination IP addresses are selected randomly. |
ISA Server Web Proxy |
|
|
|
\ISA Server Web Proxy\Upstream Bytes Sent/sec |
Bytes sent to servers. |
Depends on cache hit ratio. |
|
\ISA Server Web Proxy\Upstream Bytes Received/sec |
Bytes received from servers. |
Depends on cache hit ratio. |
|
\ISA Server Web Proxy\Upstream Bytes Total/sec |
Total bytes on server-side connections. |
Depends on cache hit ratio. |
|
\ISA Server Web Proxy\Client Bytes Sent/sec |
Bytes sent to clients. |
|
|
\ISA Server Web Proxy\Client Bytes Received/sec |
Bytes received from clients. |
|
|
\ISA Server Web Proxy\Client Bytes Total/sec |
Total bytes on client-side connections. |
|
|
\ISA Server Web Proxy\SSL Client Bytes Sent/sec |
SSL tunneling bytes sent. |
|
|
\ISA Server Web Proxy\SSL Client Bytes Received/sec |
SSL tunneling bytes received. |
|
|
\ISA Server Web Proxy\SSL Client Bytes Total/sec |
Total SSL tunneling bytes. |
|
|
\ISA Server Web Proxy\Cache Hit Ratio for Last 10K Requests (%) |
Percentage of URLs that are fetched from cache. |
Suspect the cache is not working if low (less than 5%). |
Consider disabling the cache. |
\ISA Server Web Proxy\HTTPS sessions |
Number of SSL connections. |
|
|
\ISA Server Web Proxy\Reverse Bytes Sent/sec |
Bytes sent to published Web sites. |
Much smaller than Bytes Received. Suspect if more than 10% of Reverse Bytes Received. |
|
\ISA Server Web Proxy\Reverse Bytes Received/sec |
Bytes received from published Web sites. |
|
|
\ISA Server Web Proxy\Reverse Bytes Total/sec |
Total throughput between ISA Server and Web published sites. |
|
|
\ISA Server Web Proxy\Average Milliseconds/request |
Average response time. |
Suspect if more than 30,000 milliseconds. |
Use Direct Fetches and Cache Fetches to diagnose. |
\ISA Server Web Proxy\Current Direct Fetches Average Milliseconds/request |
Average time to fetch a URL from upstream. |
Could be several seconds. Suspect if more than 10,000 (10 seconds). |
May indicate WAN network connectivity problems or misconfiguration. |
\ISA Server Web Proxy\Current Cache Fetches Average Milliseconds/request |
Average time to fetch a URL from cache. |
Expect 1–50 milliseconds. Suspect if more than 300. |
May indicates that disk transfers are higher than capacity. For more information, see \PhysicalDisk(*)\Disk Transfers/sec. |
\ISA Server Web Proxy\Requests/sec |
Request rate. |
Client Bytes Sent/sec divided by Requests/sec provides a measure of average response size, which should be no more than 20 KB. |
|
\ISA Server Web Proxy\Failing Requests/sec |
Failing request rate. |
Should be much smaller than request rate. Suspect if not. |
|
\ISA Server Web Proxy\DNS Cache Hits (%) |
Rate of DNS cache hits. |
Maximum is 100%. Expected 70%–90%. Suspected if greater than 30%. |
Check for possible DNS or network misconfiguration. May mean an attack where destination IP addresses are selected randomly. |
\ISA Server Web Proxy\Incoming Connections/sec |
Number of incoming connections per second. |
Requests/sec divided by Incoming Connections/sec provides a metric for average requests/connection with expected values: forward proxy 10–20, transparent and reverse proxy 5–10. Suspect if less than 2 (requests/connection). |
Misconfiguration of client Web browser. |
\ISA Server We Proxy\Outgoing Connections/sec |
Number of outgoing connections per second. |
|
|
ISA Server Cache |
|
|
|
\ISA Server Cache\* |
ISA Server Web cache has two parts, in memory and on disks. Total URL fetches from disks should be the same as the total disk transfers. Be sure they are evenly spread on all disks, and have enough disks to handle no more than maximum fetches per disk (\PhysicalDisk(*)\Disk Transfers/sec). |
|
|
\ISA Server Cache\Disk Failure Rate (Fail/sec) |
Indicates if there are disk fetches that fail. |
Suspect if not negligible. Could be a hardware problem. |
Look for events in Event Viewer indicating disk failure. Replace disk if necessary. |
\ISA Server Cache\Memory Cache Allocated Space (KB) |
Amount of memory currently used by the memory cache. |
When cache is full, it should be between 50% to 100% of total memory cache size. |
|
\ISA Server Cache\Disk Cache Allocated Space (KB) |
Amount of disk space currently used by the disk cache. |
When cache is full, it should be between 50% to 100% of total disk cache size. |
|
\ISA Server Cache\Memory Usage Ratio Percent (%) |
Percentage of URLs that are fetched from memory cache in proportion to all cache fetches. |
In reverse caching, this can be made high (above 50%). In forward caching, it is generally less than 50%. |
In reverse caching, try to increase the size of the memory cache if less than 50%. |
\ISA Server Cache\Disk Content Write Rate (writes/sec) |
Disk cache write rate. |
Should be low compared to read rate because writes are gathered in batches of several URLs. |
For information, see \PhysicalDisk\Disk Writes/sec. |
\ISA Server Cache\Disk URL Retrieve Rate (URL/sec) |
Throughput from disk cache in URLs per second. |
Depends on hit ratio. High (as compared to disk retrieve rate) in forward caching, low in reverse. (Bytes Retrieved Rate) / (URL Retrieve Rate) = Bytes/URL, which should be up to 20 KB under normal conditions. Suspect otherwise. |
|
\ISA Server Cache\Disk Bytes Retrieved Rate (KB/sec) |
Throughput from disk cache in KB per second. |
Depends on hit ratio. High (as compared to disk retrieve rate) in forward caching, low in reverse. (Bytes Retrieved Rate) / (URL Retrieve Rate) = Bytes/URL, which should be up to 20 KB under normal conditions. Suspect otherwise. |
|
\ISA Server Cache\Memory URL Retrieve Rate (URL/sec) |
Throughput from memory cache in URLs per second. |
Depends on hit ratio. Low (as compared to memory retrieve rate) in forward caching, high in reverse. (Bytes Retrieved Rate) / (URL Retrieve Rate) = Bytes/URL, which should be up to 20 KB under normal conditions. Suspect otherwise. |
|
\ISA Server Cache\Memory Bytes Retrieved Rate (KB/sec) |
Throughput from memory cache in KB per second. |
Depends on hit ratio. Low (as compared to memory retrieve rate) in forward caching, high in reverse. (Bytes Retrieved Rate) / (URL Retrieve Rate) = Bytes/URL which should be up to 20 KB under normal conditions. Suspect otherwise. |
|