Managed Availability Probes

Article
08/11/2014

Probes are one of the three critical parts of the Managed Availability framework (monitors and responders are the other two). As I wrote previously, monitors are the central components, and you can query monitors to find an up-to-the-minute view of your users’ experience. Probes are how monitors obtain accurate information about that experience.

There are three major categories of probes: recurrent probes, notifications, and checks.

Recurrent Probes

The most common probes are recurrent probes. Each probe runs every few minutes and checks some aspect of service health. They may transmit an e-mail to a monitoring mailbox using Exchange ActiveSync, connect to an RPC endpoint, or establish CAS-to-Mailbox server connectivity. All of these probes are defined in the Microsoft.Exchange.ActiveMonitoringProbeDefinition event log channel each time the Exchange Health Manager service is started. The most interesting properties for these events are:

Name: The name of the Probe. This will begin with the SampleMask of the Probe’s Monitor.
TypeName:The code object type of the probe that contains the probe’s logic.
ServiceName: The name of the Health Set for this Probe.
TargetResource: The object this Probe is validating. This is appended to the Name of the Probe when it is executed to become a Probe Result ResultName
RecurrenceIntervalSeconds: How often this Probe executes.
TimeoutSeconds: How long this Probe should wait before failing.

On a typical Exchange 2013 multi-role server, there are hundreds of these probes defined. Many probes are per-database, so this number will increase quickly as you add databases. In most cases, the logic in these probes is defined in code, and not directly discoverable. However, there are two probe types that are common enough to describe in detail, based on the TypeName of the probe:

Microsoft.Exchange.Monitoring.ActiveMonitoring.ServiceStatus.Probes.GenericServiceProbe: Determines whether the service specified by TargetResource is running.
Microsoft.Exchange.Monitoring.ActiveMonitoring.ServiceStatus.Probes.EventLogProbe: Logs an error result if the event specified by ExtensionAttributes.RedEventIds has occurred in the ExtensionAttributes.LogName. Success results are logged if the ExtensionAttributes.GreenEventIds is logged. These probes will not work if you override them to watch for a different event.

The basics of a recurrent probe are as follows: start every RecurrenceIntervalSeconds and check (or probe) some aspect of component health. If the component is healthy, the probe passes and writes an informational event to the Microsoft.Exchange.ActiveMonitoringProbeResult channel with a ResultType of 3. If the check fails or times out, the probe fails and writes an error event to the same channel. A ResultType of 4 means the check failed and a ResultType of 1 means that it timed out. Many probes will re-run if they timeout, up to the MaxRetryAttempts property.

The ProbeResult channel gets very busy with hundreds of probes running every few minutes and logging an event, so there can be a real impact on the performance of your Exchange server if you perform expensive queries against this event channel in a production environment.

Read the complete blog at https://blogs.technet.com/b/exchange/archive/2014/08/11/managed-availability-probes.aspx

Read my favorites blogs:

Assigning File Share permissions using Power Shell

Disk Read Error when migrating virtual machine from one cluster to another

Designing a backup less Exchange 2010 Architecture

Appear Offline in Microsoft Office Communicator Server 2007

Microsoft Exchange 2010 Test cases

Microsoft Exchange Server 2010 Disaster Recovery

Managed Availability Probes

Recurrent Probes

Additional resources