Monitoring Exchange Events

 

The Exchange Management Pack, combined with Microsoft Operations Manager (MOM), provides complex filtering and viewing tools to help you monitor events related to your Exchange organization. MOM includes two default event views that can be accessed from the Microsoft Operations Manager 2005 - Operator Console:

  • Events   Collected events from monitored servers are listed here. The events include information, in addition to warnings and errors. To view Exchange-specific information, you can sort by Source or Event ID. Using this view is a quick way to discover events that occur across many servers in the organization. For example, suppose you notice server mail flow errors. Additional inspection and a sort reveal that the incident is isolated to a specific geographical site only, and does not hinder performance for other users. Focusing resources and correcting the problem by using this view fosters immediate data gathering from a central location.

  • Task Status   MOM-related events dealing with scheduled tasks are listed in this directory. For this event, mail flow is one of the most important events to monitor. This view lists general information, warnings, and errors. You can filter events in this view based on criteria, such as matching words, category, or severity.

System Monitoring Best Practices

The Exchange Management Pack allows you significant flexibility in the messaging functionalities that you monitor. At a minimum, you should monitor the items listed in the following table.

Minimum messaging functions to monitor

Test Details

Server availability

  • Server heartbeat.

  • Required services are running.

  • Databases are mounted.

  • MAPI logon check verification is running without errors.

  • Mail flow verification is running without errors.

  • No unexpected service termination.

  • Front End Server Monitoring test is running without errors.

Services running

  • Verify that all required services are running on each server. Note that you can configure the list of monitored services for each server.

  • Generate an alert when a service is not running.

Databases mounted

  • Verify that all databases are mounted.

  • Generate an alert if any database becomes dismounted.

MAPI Logon check

  • Verify that the Server Availability Report shows no errors. This test verifies that each store can be accessed by a MAPI client, and implicitly verifies both Exchange and Active Directory functionality.

Log on to the mailbox of a test account

  • Verify client to server connectivity, including verification that Exchange is running, the database is mounted, and Active Directory is functioning correctly.

  • Use this data to compile server availability statistics.

Front-end Server Monitoring

After you modify your registry to enable Front-end server monitoring, the following tests are performed:

  • Verify that services are running on the front-end server.

  • Verify that Internet clients can connect, including Outlook Web Access, Outlook Mobile Access, and Exchange ActiveSync (for computers that are running Exchange Server 2003).

  • Verify localhost monitoring occurs by default.

  • Verify that the public URL is resolvable and successfully connects to your front-end servers.

  • Verify that connectivity through your firewall and/or proxy server is functioning.

  • Verify that load balancing is occurring.

Mail flow verification

  • Verify mail flow between selected servers by sending periodic e-mail messages to test mailboxes on each server.

  • Generate an Alert for successive failures.

  • Record mail delivery latency.

Server Health Monitoring

Scripts and rules are configured by default to monitor key health indicators. These indicators include:

  • Free Disk Space

  • Mail Queue Thresholds

  • Configuration and Security

  • Performance Thresholds

  • SMTP Queues

Free disk space

Running out of disk space is a common, preventable source of Exchange failures. This test monitors counter thresholds that you specify for the following performance objects:

  • All disks

  • Log disks

  • SMTP queue disks

The Free disk space test is cluster and IFS aware, and uses WMI to collect information. It does not use performance data.

Mail Queues

  • Verify that all mail queues (SMTP, MTA, internal mail delivery queues) are processing messages according to your thresholds

  • Verify that mail is flowing properly

  • Identify queue length problems that may lead to slow e-mail delivery and identify issues in your infrastructure that require attention

  • This data is based on performance data and Exchange WMI classes.

Server Configuration and Security Monitoring

  • Verify that the IIS Lockdown Tool started.

  • Verify that Message Tracking Log shares are locked down.

  • Verify that the URLScan ISAPI filter is installed and running.

  • Verify that SMTP Virtual Server cannot anonymously relay (spam prevention).

  • Check for the existence of mailboxes on front-end servers.

  • Determine if SSL should be required.

  • Verify that the Log Files are being successfully purged after backup.

  • Verify that the SMTP directories are on an NTFS formatted drive.

  • Verify that circular logging is disabled for each Storage Group.

  • Verify that the value of the HeapDeCommitFreeBlock Threshold Registry Key is correct.

  • Verify that Message Tracking is enabled.

Server performance

  • Generate an alert if thresholds for disk response are exceeded, indicating a slow disk.

  • Generate an alert if the RPC requests queue length exceeds expected thresholds. A consistent high value can indicate that you have a resource bottleneck.

  • Monitors the average RPC latency of all RPC requests submitted to the server.

  • Monitors the Outlook Mobile Access Latency response time.

Server performance issues quickly become user response time issues. You can quickly solve these problems if you monitor the correct objects and act upon the issues that MOM brings to your attention.

Database checkpoint depth and memory usage

An alert is generated by default if any of the following counters exceed the identified threshold:

  • Disk Read Latencies: 50 ms

  • Disk Write Latencies: 50 ms

  • ESE Log Checkpoint Depth: 800

  • Information Store Private Bytes: 1 GB

  • Information Store Virtual Bytes: 2.9 GB

  • MSExchangeIS: RPC Requests: 25

  • MSExchangeIS: RPC latency: 200 ms

  • Outlook Mobile Access: Last response time: 60 sec