Dela via


Online Maintenance Database Scanning in Exchange 2007 SP1 and SP2

Microsoft Exchange Server 2007 will reach end of support on April 11, 2017. To stay supported, you will need to upgrade. For more information, see Resources to help you upgrade your Office 2007 servers and clients.

 

Applies to: Exchange Server 2007 SP1, Exchange Server 2007 SP2

In Microsoft Exchange Server 2007 Service Pack 1 (SP1) and Exchange 2007 Service Pack 2 (SP2), you can use three registry subkeys to enable and configure online maintenance database scanning. When online maintenance database scanning is enabled, Exchange performs the following steps:

  1. Reads in database pages.

  2. Checksums the database pages. Checksumming is the process by which Exchange checks the integrity of a database by computing a value (a checksum) that depends on the contents of the database.

  3. If configured, performs page zeroing on the database pages. Page zeroing is a process that is performed at the end of a streaming backup in which the data within the database is overwritten with characters that you have selected for that purpose. This makes the data unrecoverable by conventional means.

If you configure a server for online maintenance database scanning by using the process that is described in this topic, the page zeroing and checksumming processes are tied together (also known as piggybacked) so that the read input/output (I/O) load is effectively cut in half.

This topic describes:

  • Checksumming and page zeroing processes for Exchange databases for the release to manufacturing (RTM) version of Exchange 2007 and for Exchange 2007 SP1 and SP2.

  • Registry subkeys you can use to enable and configure online maintenance database scanning.

  • Performance counters that you can use to analyze system behavior during online maintenance database scanning.

  • Events that you can use to monitor online maintenance database scanning.

Checksumming and Page Zeroing in Exchange 2007 RTM

In Exchange 2007 RTM, there are several scenarios in which data integrity is not automatically verified through checksumming and in which page zeroing does not occur.

These scenarios include:

  • Creating backups only from the passive copy of a storage group in a cluster continuous replication (CCR) or local continuous replication (LCR) organization. If you create backups only from the passive copy, the active copy of the database is never checksummed.

  • Using Microsoft Data Protection Manager (DPM) to create differential block-level backups of Exchange data. In this scenario, only the changed data that is being backed up is checksummed. The unchanged data is not checksummed. As a result, the integrity of the data is not known with certainty, because it may have been corrupted over time (a condition that is commonly known as bit rot).

  • When using Volume Shadow Copy Service (VSS)-based backups in a CCR or LCR organization. In this scenario, page zeroing does not occur because it is enabled only for streaming backups.

  • When using streaming backups in a CCR or LCR organization. In this scenario, page zeroing activity on the active copy of the database does not generate transaction log files. Without transaction log files, these changes cannot be replicated to the passive copy of the database.

Checksumming Databases

As mentioned earlier, checksumming is the process of checking the integrity of a database by computing a value (checksum) that depends on the contents of the database. The checksum is stored with data, and Exchange uses this value to make sure that the data is not corrupted. Prior to Exchange 2007 SP1, an entire database was checksummed during an online full streaming backup. A full VSS snapshot of a database could also be checksummed. (Although, it was the copy that was checksummed, and not the actual production database). However, the development of CCR and LCR and the introduction of DPM made this approach inadequate.

Checksumming with CCR and LCR

With both CCR and LCR, there are two copies of the Exchange databases, and you can select whether to back up the source copy or the target copy. The copy that is backed up is the copy that is checksummed (either by streaming or by means of VSS). The other copy is not checksummed.

Prior to Exchange 2007 SP1, the only way to schedule a checksum was by running a full backup. There were two common methods for working around this issue:

  • Move the clustered Mailbox server on a weekly basis so that the backup was moved to the alternate copy. This method is not desirable because it:

    • Requires the backup application to be CCR-aware.

    • Increases management complexity.

    • Increases downtime. (99.999 uptime not possible with this method.)

    • Does not work with LCR.

  • Suspend replication and replay, and then checksum the database by using Exchange Server Database Utilities (Eseutil). This method is not desirable because the cluster is not resilient to failures during this period, and the workaround must be manually scripted. In effect, only one copy can be checksummed on a regular basis, which results in less certainty about the integrity of one of the database copies in the cluster. Ideally, errors should be caught early, before both copies of the database can be corrupted.

With the introduction of online maintenance database scanning in Exchange 2007 SP1, you are no longer limited to these workaround methods.

Checksumming with DPM

Microsoft Data Protection Manager (DPM) version 2 supports backing up and restoring Exchange 2007 databases. DPM can take an artificial full VSS backup by performing differential block synchronization. This artificial full backup copies only the blocks that have been changed since the last full backup (which reduces the backup time period). A side effect of an artificial full backup is that unchanged database pages are not checksummed. Therefore, some database pages may not be checksummed for long periods of time. With block differential backups, there is no way to guarantee that the original copy is reliable and uncorrupted. The administrator knows only that the backup copy has been verified.

Page Zeroing Databases

Page zeroing (also called zeroing out or page scrubbing) is a process that is performed at the end of a streaming backup in which the data within the database is overwritten with characters that you have selected for that purpose. This makes the data unrecoverable by conventional means. When an item is deleted from an Exchange server (for example, when users delete messages from their mailboxes) and deleted item retention is disabled, the pages that item was occupying are marked as unused. When page zeroing is enabled, the data that is contained in unused pages is overwritten with the selected overwrite during an online backup. As each database page is backed up, the page is overwritten with the selected characters one time in the database on the hard disk. After the backup is complete, the deleted data exists on the backup copy, but it no longer exists in the database and cannot be recovered by conventional means.

In Exchange 2007 RTM, you could zero out deleted database pages when streaming online backups were performed by setting the Zero Database During Backup registry key. This method worked well. However, with VSS backups and CCR, it is no longer sufficient because VSS backups do not provide a way to zero out deleted pages.

Note

As a best practice, if you want to enable page zeroing on a database, you should do so when you create the database. If you do not configure page zeroing when you create the database, the first time page zeroing is run against the database, it will significantly impact server performance. The performance impact is considerably less after page zeroing has completed the first pass of the database. You can use throttling to limit the performance impact of the first page zeroing pass

Page Zeroing with Continuous Replication

In Exchange Server 2003 and Exchange 2007, you have been able to use streaming backups to augment VSS backups when they required page zeroing. With the introduction of CCR and LCR in Exchange 2007, another issue is raised: page zeroing modifies the database without generating corresponding transaction logs. This means that in CCR and LCR organizations, page zeroing activity is not replicated between databases. In Exchange 2007 RTM, to make sure that page zeroing is effective with CCR, you must perform one of the following tasks:

  • Run a streaming backup against each copy. This involves moving the CCR clustered CCR Mailbox server between nodes.

  • Take the target database offline and use Eseutil to run the eseutil /z command.

However, with increased concerns about security and compliance, these options are no longer satisfactory. In Exchange 2007 SP1 and SP2, page zeroing is moved to a background process and it now generates logs that can be shipped to replicate page zeroing to database copies in CCR and LCR environments, as well as environments that use standby continuous replication (SCR).

Note

Enabling page zeroing during online maintenance temporarily results in an increase in log generation. After the feature has been enabled for a while, log generation activity should return to the level it was at prior to enabling page zeroing.

Online Maintenance Database Scanning with Exchange 2007 SP1 and SP2

When you enable online maintenance database scanning in Exchange 2007 SP1 and SP2, Exchange reads in database pages, checksums them, and if so configured, performs page zeroing on them. All of these steps are performed in the background.

Online maintenance database scanning in Exchange 2007 SP1 and SP2 has the following features:

  • Online maintenance database scanning is not enabled by default. Because database scanning can affect server performance, you must manually opt in by adding subkeys to the registry. For more information, see "Using Registry Keys to Enable and Configure Online Maintenance Database Scanning" later in this topic.

  • You can enable database checksumming with page zeroing or by itself.

  • Database page zeroing and checksumming take place outside of the streaming backup process. Both operations are performed against a page when it is retrieved from disk. There is one database scan task in which both page zeroing and online checksumming are invoked when either one is enabled.

  • Database scanning tracks its progress in a manner similar to online defragmentation. It updates its progress at regular intervals so that it can continue where it left off when it resumes after an interruption.

  • You can enable database scanning only at the server level. Enabling database scanning at the storage group level or database level is not supported.

  • Database scanning provides a database page zeroing mechanism that replicates changes between database copies with both CCR and LCR.

  • Database scanning requires that page zeroing transactions go through the normal transaction logging process so that changes can replicate to CCR and LCR copies.

  • Throttling interrupts online maintenance database scanning for a specified number of milliseconds between every 320 kilobytes (KB) of I/O. This process allows the server to perform other tasks. You can use throttling to reduce the performance impact of the online checksum process on the server, such as when you are running online maintenance database scanning during the business day.

  • When you enable online maintenance database scanning, the online maintenance window that is scheduled for a specific database is split between the database scan process and the database online defragmentation process. For example, if you schedule an eight-hour online maintenance window, approximately four hours are used for the database scan task and four hours are used for the online defragmentation task.

Using Registry Keys to Enable and Configure Online Maintenance Database Scanning

The following table lists the registry subkeys that you can use to enable and configure online maintenance database scanning. These subkeys must be added to the registry by the administrator. They are not added to the registry by default when Exchange is installed. The path to each subkey is HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MSExchangeIS\ParametersSystem.

Warning

Incorrectly editing the registry can cause serious problems that may require you to reinstall your operating system. Problems resulting from editing the registry incorrectly may not be able to be resolved. Before editing the registry, back up any valuable data.

For more information about using these registry keys to enable and configure online maintenance database scanning, see How to Configure Online Maintenance Database Scanning in Exchange 2007 SP1 and SP2.

Task Registry subkey Type Description

Enabling online maintenance database checksumming

Online Maintenance Checksum

REG_DWORD

This registry subkey enables database checksumming during an online maintenance pass. If this subkey is not present in the registry (or if it is present but set to 0), database checksumming is not performed.

Enabling online maintenance database page zeroing

Zero Database Pages During Checksum

REG_DWORD

This registry subkey enables database page zeroing. If this subkey is not present in the registry (or if it is present but set to 0), page zeroing is not performed during online maintenance database scanning.

Enabling online maintenance database throttling

Throttle Checksum

REG_DWORD

This registry subkey is used to specify the throttling time interval (the number of milliseconds between every 320 KB of I/O) during which the server can perform other tasks. If this subkey is not present in the registry (or if it is set to a value of 0), throttling is not used.

Performance Counters for Monitoring Online Maintenance Database Checksumming and Page Zeroing

The following tables list the performance counters you can use to monitor and analyze system performance with database scanning.

Note

To use the performance counters listed in the table, you must enable extended Extensible Storage Engine (ESE) performance counters. For information about how to enable extended ESE performance counters, see How to Enable Extended ESE Performance Counters.

Counters for monitoring checksumming performance

Performance counter Description

MSExchangeDatabase\Online Maintenance (DB Scan) Pages Read/sec

This performance counter gives the rate at which database pages are read from all the databases in the entire Exchange store during an online maintenance database scan.

MSExchangeDatabase==>Instances\Online Maintenance (DB Scan) Pages Read/sec

This performance counter gives the rate at which the database pages for individual instances (such as for a single storage group) are read during an online maintenance database scan.

Counters for monitoring page zeroing performance

Performance counter Description

MSExchangeDatabase\Online Maintenance (DB Scan) Pages Zeroed/sec

This performance counter gives the rate at which database pages from all the databases in the entire Exchange store are zeroed out during an online maintenance database scan.

MSExchangeDatabase\Database==>Instances\ Online Maintenance (DB Scan) Pages Zeroed/sec

This performance counter gives the rate at which database pages for individual instances (such as for a single storage group) are zeroed out during an online maintenance database scan.

Events for Monitoring Online Maintenance Database Scanning

The following table lists the events you can use to monitor online maintenance database scanning in Event Viewer.

For more information, see How to Monitor Online Maintenance Database Scanning in Exchange 2007 SP1 and SP2.

Event Description Examples from the Application log in Event Viewer

Event 717: Database checksumming background task has started.

This event fires when database checksumming has started.

Not applicable

Event 718: Database page zeroing background task has started.

This event fires when database page zeroing has started.

Not applicable

Event 721: Database checksumming background task has completed.

This event fires when database checksumming has completed. It reports the following information:

  • Number of pages seen

  • Number of bad checksums

  • Number of uninitialized pages

Event Type: Information

Event Source: ESE

Event Category: Online Defragmentation

Event ID: 721

Date: 6/20/2007

Time: 8:21:37 AM

User: N/A

Computer: ExchangeServer01

Description:

MSExchangeIS (6544) Third Storage Group: Online maintenance database checksumming background task is complete for database 'J:\sg3\priv3.edb'. This pass started on 7/9/2007 and ran for a total of 20 seconds, requiring 1 invocations over 1 days.

Operation Summary:

768 pages seen

0 bad checksums

268 uninitialized pages

Event 722: Database page zeroing background task has completed.

This event fires when database page zeroing has completed. It reports the following information:

  • Number of pages seen

  • Number of bad checksums

  • Number of uninitialized pages

  • Pages unchanged since last zero

  • Number of unused pages zeroed

  • Number of used pages seen

  • Number of deleted records zeroed

  • Number of unreferenced data chunks zeroed

Event Type: Information

Event Source: ESE

Event Category: Online Defragmentation

Event ID: 722

Date: 6/20/2007

Time: 8:21:37 AM

User: N/A

Computer: ExchangeServer01

Description:

MSExchangeIS (6544) Third Storage Group: Online Maintenance Database Zeroing background task has completed for database 'J:\sg3\priv3.edb'. This pass started on 6/20/2007 and ran for a total of 369 seconds, requiring 1 invocations over 1 days. Operation summary:

5850768 pages seen

0 bad checksums

72681 uninitialized pages

4379723 pages unchanged since last zero

33759 unused pages zeroed

1210764 used pages seen

57214 deleted records zeroed

0 unreferenced data chunks zeroed

Event 723: Database checksumming background task encounters an error.

This event fires when the database checksumming background task encounters an error.

Not applicable

Event 724: Database page zeroing background task encounters an error.

This event fires when the database page background task encounters an error.

Not applicable

Event 729: Database page zeroing has been paused.

This event fires when database page zeroing is paused during online maintenance due to a lack of free flushable pages.

Event Type: Error

Event Source: ESE

Event Category: Online Maintenance

Event ID: 729

Date: 7/27/2007

Time: 5:05:30 AM

User: N/A

Computer: ExchangeServer01

Description:

MSExchangeIS (5828) SG15: Online Maintenance Page Zeroing has been paused one or more times in the last 60 minutes for the following databases: 'v:\sg15\data\priv15test.edb'. The ESE Database Cache is not large enough to simultaneously run online maintenance page zeroing against the listed databases. Action: Stagger the online maintenance time windows for the listed databases or increase the amount of physical RAM in the server.

For More Information

For more information about System Center DPM 2007, see System Center Data Protection Manager 2007.

For information related specifically to DPM, see the Exchange Server Team blog article System Center Data Protection Manager 2007 Beta 2 goes live!

Note

The content of each blog and its URL are subject to change without notice. The content within each blog is provided "AS IS" with no warranties, and confers no rights. Use of included script samples or code is subject to the terms specified in the Microsoft Terms of Use.

For more information about online maintenance database scanning and CCR, see Planning for Cluster Continuous Replication.