Monitoring Site Availability
Create an availability checklist to monitor the availability of your site. The availability checklist should contain the items in this section.
Monitor Bandwidth Usage: Per Day, Week, and Month
Bandwidth. How bandwidth is being used (peak and idle).
Usage. How usage increases; if it increases, when it increases, and how long it increases.
You can use this information to project how much bandwidth you will need in the future. This will enable you to plan for the peak bandwidth you need for a holiday shopping season.
You can obtain bandwidth usage data from managed routers and Internet Information Services (IIS) 6.0 log analysis (using the Commerce Server Data Warehouse).
Monitor Network Availability
Use Network Internet Control Message Protocol (ICMP) echo pings that are available in most network monitoring software.
Compare your network availability to the level agreed to in your service level agreement (SLA) with your Internet Service Provider (ISP)/data center provider. Request improvement if network availability falls lower than the level agreed to in the SLA.
The formula for measuring network availability is as follows:
(Number of successful ping returns/number of total pings issued) x 100%
Monitor System Availability
Monitor the following systems:
Operating system. Monitor typical and abnormal shutdowns of the operating system.
SQL Server. Monitor typical operation and failover events of Microsoft SQL Server.
Internet Information Services (IIS). Monitor typical and abnormal shutdowns in IIS.
The formula for measuring system availability is as follows:
(Period of measurement-downtime)/period of measurement) x 100%
Monitor HTTP Availability
Monitor the following HTTP requests:
HTTP requests (internal). Monitor HTTP requests issued internally.
HTTP requests (per ISP). Monitor HTTP requests issued from ISP networks to track whether users of the monitored ISP networks can access your site.
HTTP requests (per geographic location). Monitor HTTP requests issued from different geographic locations (New York, San Francisco, London, Paris, Munich, Tokyo, Singapore, and so on) to track whether users from respective areas of the world can access your site.
Downtime occurs when the site does not return a page or returns a page with an incorrect response. The formula for measuring HTTP availability is as follows:
(Number of successful HTTP requests/number of total HTTP requests issued) x 100%
Monitor Performance Metrics
Monitoring performance is not strictly part of monitoring availability. However, monitoring performance can sometimes provide warning about potential problems that can affect availability if you do not address them.
Monitor the following performance metrics:
Number of visits (per day/week/month). Monitor site traffic information to assess the level of site activity. This data is available from the Data Warehouse.
Latency of requests for sets of operations and page groups (per day/week/month). Compare these metrics to your transaction cost analysis (TCA) test results to see how site performance compares to TCA predictions and to identify system bottlenecks.
CPU utilization (per day/week/month). Monitor use on Microsoft Windows servers, SQL Server servers, IIS/Commerce Server servers, middleware, and so on. Group servers by function to make it easier to track and plan site capacity.
Disk storage. Group servers by function and monitor disk capacity (total disk capacity and free space). Review weekly and monthly history, so that you can spot trends and plan for expansion.
Disk I/O. Group servers by function and monitor disk input/output (I/O) throughput. Compare weekly and monthly history with the disk I/O rating provided by the manufacturer. If the observed I/O nears the disk I/O, consider adding more spindles (adding more drives to the drive stripe set) or redistribute disk I/O to multiple disk controllers.
Fiber channel controller/switch bandwidth. Monitor system area network (SAN) fiber channel controller bandwidth. (You use a SAN to interconnect nodes in a distributed computer system, such as a cluster. These systems are members of a common administrative domain and are usually in close physical proximity.) If the observed bandwidth nears the throughput rating provided by the manufacturer, consider adding more controllers and switches to redistribute traffic and obtain more aggregate bandwidth.
Memory. Make sure that the amount of available memory is more than 4 MB. If the system nears this level during peak usage, add more memory to the server.