Communications
Standby Continuous Replication in Exchange Server 2007 Service Pack 1
Scott Schnoll
At a Glance:
- Configuring Standby Continuous Replication
- The importance of redundancy
- How SCR mitigates downtime
Service Pack 1 provides several new features and enhancements for Exchange 2007. One of these new features, Standby Continuous Replication (SCR), is designed to provide organizations with both in-datacenter redundancy and site resilience. As its name implies, SCR is designed
for scenarios that use or enable the use of standby recovery servers.
If you are familiar with the release to manufacturing (RTM) version of Exchange 2007, then you know that it also provides in-datacenter redundancy and site resilience through its log shipping features and support for Windows® failover clusters. In the RTM version, log shipping (known officially as continuous replication) is available in two forms: local continuous replication (LCR), as illustrated in Figure 1, and cluster continuous replication (CCR), as illustrated in Figure 2.
Figure 1** Local continuous replication **
Figure 2** CCR is log shipping to a second server in a Windows failover cluster **
Continuous replication provides data availability and redundancy by allowing administrators to enable and maintain online a second copy of each mailbox database. This database copy represents a first line of defense against failure, loss, or corruption of a production database. Instead of having to waste time locating a backup tape in order to restore data, a database copy can be activated and turned into a production database within minutes.
SCR extends the scenarios in which you can achieve data and service availability for your organization. The new scenarios enable you to separate high availability topologies from site resilience topologies, and they also allow you to deploy configurations that are tailored to your organization's specific needs in each area.
The RTM version of Exchange 2007 provides in-datacenter redundancy and site resilience, but because LCR and CCR provide just one extra copy of each database, you must choose between resilience and redundancy. For example, consider the data and service availability features provided by CCR. Deploying both the active and passive nodes in a single datacenter provides in-datacenter service and data availability but not site resilience (because both nodes are in the same physical site). Deploying the active node in one datacenter and the passive node in a second datacenter gives you site resilience but not in-datacenter availability (because the passive node, which provides these features, is located in a second datacenter).
With SCR in Service Pack 1, the ability to create additional copies of each database means that high availability and site resilience are not mutually exclusive; you can achieve both. For example, as shown in Figure 3, you can combine SCR with CCR to replicate storage groups locally in a primary datacenter (using CCR for high availability) and remotely in a second or backup datacenter (using SCR for site resilience). The second datacenter contains a standby cluster that can be activated and quickly provisioned with a replacement clustered mailbox server in a site recovery scenario.
Figure 3** CCR deployed in Redmond datacenter and SCR deployed in Quincy datacenter **(Click the image for a larger view)
Figure 3 depicts an enterprise deployment with two datacenters, each with its own Active Directory® site: Redmond and Quincy. The Redmond site is located in the primary (production) datacenter, and the Quincy site is located in a second (backup) datacenter. CCR is deployed in Redmond to achieve in-datacenter redundancy. Along with the infrastructure elements required for Exchange 2007, SCR targets are configured on a standby cluster in Quincy to achieve site resilience. These additional infrastructure elements, which include Client Access and Hub Transport servers, Active Directory and DNS servers, and Internet access, can be dedicated or non-dedicated resources. Dedicated resources are those resources that are designated to support only the users of the datacenter in which they reside. Non-dedicated resources are those resources that support users in the local datacenter as well as users in other locations. You must decide whether resources will be dedicated or non-dedicated, depending on what's best for your organization. For more information about dedicated and non-dedicated resources, see the Exchange Server 2007 Help file topic "Site Resilience Configurations" at technet.microsoft.com/bb201662.aspx. Note also the use of a new type of majority node set (MNS) quorum. In Exchange Server 2007, CCR uses the MNS quorum with file share witness (FSW) instead of the traditional voter node, as you can see in Figure 3.
In Figure 4, a CCR plus SCR environment that is designed with resiliency in mind provides several layers of redundancy for the mailboxes and services that are hosted on the server EXCLUS1, thereby protecting these mailboxes from small- to large-scale catastrophic failures.
Figure 4** Standalone mailbox servers using SCR to replicate storage groups to each other **(Click the image for a larger view)
Small- and medium-sized organizations can also benefit from SCR. For example, as shown in Figure 4, an organization can deploy two standalone Mailbox servers (EXMBX1 AND EXMBX2) and use SCR to replicate some or all storage groups from one Mailbox server to the other.
In this example, both EXMBX1 and EXMBX2 are production servers with five active storage groups. Each storage group is an SCR source for a corresponding SCR target on the other server. In the event of a storage failure, or some other event in which an active storage group configured as an SCR source is unavailable, the SCR target copy can be quickly activated using a few administrative tasks in the Exchange Management Shell. With Microsoft® Office Outlook® 2007 and the database portability and Autodiscover features in Exchange 2007, downtime in the event of an active storage group loss (or, for that matter, a multiple active storage group loss) could be mere minutes.
SCR Sources and Targets
As with LCR and CCR, SCR also uses the concept of active and passive copies of a storage group, but it refers to them as SCR sources and targets, respectively. Nevertheless, SCR sources and targets are storage group copies. (Recovery storage groups cannot be enabled for SCR.)
The starting point for SCR (the SCR source) is any storage group on a standalone mailbox server or on a clustered mailbox server in a single copy cluster or CCR environment. It is important to note that, while the SCR source can be a clustered mailbox server, SCR itself is not a clustered solution and does not require the Windows Cluster service. The endpoint for SCR (the SCR target) can be either a standalone mailbox server or a node in a failover cluster where the Mailbox role is installed but no clustered mailbox server has been configured in the cluster.
Source and Target Relationships
Each SCR source storage group can have an unlimited number of SCR targets. For example, a source could have one target that resides in the same datacenter as the source and a second target in a separate datacenter. However, Microsoft recommends using no more than four targets per source. If you decide to use more than four targets, you must assess the likely impact to the SCR source server in terms of memory, CPU, and disk resources and plan accordingly. Each SCR target computer can have multiple source servers. Both the source and the target computer must be running Service Pack 1 for Exchange 2007. The OS must be one that is supported by Service Pack 1 for Exchange 2007 (for example, Windows Server® 2008 or Windows Server 2003 SP2). However, regardless of which operating system you use, SCR does not have cross-OS support, and it requires that the operating system on the SCR source match the operating system on all of the SCR targets for that source. Thus, if the SCR source is running Windows Server 2003, all SCR targets for that source must also be running Windows Server 2003.
SCR is available in the Standard Edition of Exchange 2007. If a clustered mailbox server in an SCC or CCR environment is used as the SCR source, the Enterprise Edition of Exchange 2007 is required. Each SCR target supports a maximum of 50 instances (50 replicated storage groups) when using the Enterprise Edition and a maximum of 5 instances when using the Standard Edition.
SCR targets also have requirements that must be met. First, the source and target computers must be in the same Active Directory domain, though they can be in the same or in different Active Directory sites. In addition, the database and log file paths on the source and all of its targets must match for each storage group being replicated with SCR. Finally, when a node or a server is configured as an SCR target, you cannot enable LCR for any storage group on the SCR target computer, and you cannot add any clustered mailbox servers to the standby failover cluster.
Comparing SCR with CCR and LCR
SCR (as shown in Figure 5) uses the same log shipping and replay technology used by LCR and CCR to provide new deployment options and configurations. As with LCR and CCR, SCR-enabled storage groups cannot contain more than one database. Also, SCR cannot be used for a public folder database if more than one public folder database exists in the Exchange organization.
Figure 5** SCR is log shipping to another server or a passive node in a failover cluster **
One key difference with SCR is that it supports multiple targets per storage group, whereas LCR and CCR both support only one target (the passive copy). Another key difference is that, unlike CCR and LCR, you cannot back up an SCR copy. When using SCR, the database headers for SCR targets are updated and the log files are truncated when supported backups are taken against the SCR source storage group (or, in the case of CCR and LCR, when backups are taken against either the active or passive copies of the SCR source storage group).
Like LCR and CCR, log shipping with SCR is continuous and uses a pull model. As soon as a new log file has been closed and named with the next generation sequence log file number, the Microsoft Exchange Replication Service running on the SCR target computer pulls the closed transaction log files from the SCR source computer, inspects and validates them, and then moves them to their counterpoint storage group log file folder on the SCR target computer.
Replay Lag Time
After the log files are copied to the SCR target computer, SCR does something LCR and CCR do not. Instead of immediately replaying the log files into the copy of the database, SCR enforces a built-in replay delay of 50 log files and 24 hours. SCR also allows you to specify an additional time delay beyond these built-in delays. Delaying replay activity is useful in a variety of scenarios. For example, in the event of logical corruption of an active database, a delay could prevent logical corruption of the SCR target database.
The administrator-controlled replay delay is set using a parameter called ReplayLagTime, which dictates the amount of time the Exchange replication service should wait before replaying log files that have been copied to the SCR target computer. The format is Days.Hours:Minutes:Seconds, and the default value is 24 hours. The maximum allowable setting for this value is seven days. The minimum allowable setting is zero seconds, and setting this value to zero seconds effectively eliminates any delay in log replay activity above the default delay of 50 log files.
In addition to ReplayLagTime, Exchange has a built-in, hardcoded delay of 50 log files, regardless of the value for ReplayLagTime. To determine when a log file should be replayed, Exchange uses the larger of ReplayLagTime or x log files, where x=50. This is an additional safeguard against the need to reseed a storage group in situations where an SCR source that uses continuous replication (for example, a clustered mailbox server in a CCR environment) experiences a failover and one or more storage groups need to be brought online using the Restore-StorageGroupCopy cmdlet. (Seeding is the process of using the Extensible Storage Engine (ESE) streaming backup APIs to make an online copy of the SCR source database on the SCR target computer.) By delaying replay activity on the SCR targets, when a lossy failover for an SCR source occurs, the chances of needing to reseed the SCR copies will be minimized because the nature of the data loss on the SCR source puts the two copies closer together in time.
Truncation Lag Time
In the RTM version of Exchange 2007, rules are enforced in a continuous replication environment so that a log file is not deleted unless it has been backed up and replayed into the copy of the database. When using SCR, this rule is modified. SCR (which introduces the concept of multiple database copies) allows log files to be truncated on the SCR source computer as soon as they are inspected by all SCR target computers. Log truncation at the SCR source server does not wait until all logs have been replayed into all SCR targets because SCR target copies can be configured with large log replay lag times.
You can also add an additional delay to log truncation by using a new parameter called TruncationLagTime, which specifies how long the Exchange replication service should wait (in Days.Hours:Minutes:Seconds format) before truncating log files that have been copied to the SCR target computer and replayed into the copy of the database. The time period begins as soon as the log files have been successfully replayed into the copy of the database. The maximum allowable setting for this value is seven days, while the minimum is zero seconds, although zero seconds effectively eliminates any delay in log truncation activity.
In an SCR environment, a background thread runs every three minutes to determine if any log files need to be truncated. If the log file generation sequence is below the log file checkpoint for the storage group, and the log file is older than ReplayLagTime + TruncationLagTime, a log file on the SCR target will be truncated.
In an LCR or CCR environment that is extended with SCR, a log file on the SCR target will be truncated if the following four criteria are met: the log file has been backed up, the log file generation sequence is below the log file checkpoint for the storage group, the passive copy of the storage group is in a state that allows the log file to be truncated, and all SCR targets have inspected the log file.
Enabling and Managing SCR
SCR is enabled using the Enable-StorageGroupCopy cmdlet in the Exchange Management Shell, which has been updated in SP1 with some new parameters. As described above, ReplayLagTime and TruncationLagTime can give you control over some of the behavior of SCR targets. Another parameter, SeedingPostponed, can be used to skip the initial seeding of the SCR target. Postponing seeding is useful in a variety of situations. For example, say the database in the storage group being enabled for SCR is 100GB. You might not want 100GB of data to traverse the network during peak production times. The SeedingPostponed parameter gives you the option of enabling SCR immediately and performing a seeding task later. When you're ready, you can manually seed the SCR target using the Update-StorageGroupCopy cmdlet.
While the above-mentioned parameters are optional, one parameter of Enable-StorageGroupCopy is required for SCR: StandbyMachine. It specifies the name of the computer that will contain the SCR target. The value of this parameter is set as part of the value for the msExchStandbyCopyMachines attribute of the storage group being enabled for SCR. The msExchStandbyCopyMachines attribute is a multivalued Unicode string that is added to the Active Directory schema when Exchange 2007 SP1 is introduced into the Exchange organization, which is one of the reasons that SP1 requires a schema update for Active Directory.
The StandbyMachine parameter is central to SCR, and several cmdlets have been updated in SP1 to use this parameter for the enabling and management of SCR targets. The updated cmdlets are described in Figure 6.
Figure 6 Cmdlets that use the StandbyMachine parameter
Cmdlet | Description |
Disable-StorageGroupCopy | Disables an SCR target for a storage group. |
Get-StorageGroupCopyStatus | Determines the current health of the SCR target. |
New-StorageGroup | Creates a new SCR-enabled storage group without having to enable SCR separately by using the Enable-StorageGroupCopy cmdlet. |
Restore-StorageGroupCopy | Disables SCR and make an SCR target database viable for mounting with a Mount-Database operation as part of a recovery procedure. |
Resume-StorageGroupCopy | Used to resume continuous replication for a storage group that has SCR suspended. |
Suspend-StorageGroupCopy | Suspends continuous replication activity for a storage group that is enabled for SCR. |
Update-StorageGroupCopy | Used to seed or reseed an SCR target storage group. |
Activating SCR Targets
SCR provides one or more up-to-date copies of the data, which can be used should the original data become lost or unusable. The process of taking an SCR target copy and reprovisioning it as a production database is known as activation. Activation occurs as part of the recovery process, which will take the form of database portability or one of the two Setup recovery options (/m:RecoverServer to recover a standalone server, or /RecoverCMS to recover a clustered mailbox server).
How You Might Use SCR
Let's see how a fictitious company might use SCR and database portability to recover from a failure of a mailbox database. After the production database is found to be corrupt, the administrator activates the SCR target database using database portability.
The organization has deployed Exchange 2007 with SP1 and has decided to leverage SCR to provide a copy of a storage group on a remote mailbox server. Both the source and target mailbox servers are in the same Active Directory site and are configured to use Active Directory-integrated DNS servers. The Active Directory replication interval is configured for 15 minutes.
Enabling SCR and Staging Recovery
SCR is configured so that transaction log files are being replicated for a single storage group, SG1, which contains a single database, MBX1. The paths for the storage group files and database file are C:\SG1 and C:\SG1\MBX1.EDB. In this case, EXMBX1 is the SCR source and EXMBX2 is the SCR target. This was configured as you see here:
Enable-StorageGroupCopy EXMBX1\SG1
-StandbyMachine EXMBX2
After SCR is enabled, the health and status of SCR for SG1 was verified using the Get-StorageGroupCopyStatus cmdlet:
Get-StorageGroupCopyStatus EXMBX1\SG1
-StandbyMachine EXMBX2
To save time during the SCR target activation process, EXMBX2 is preconfigured with a storage group, SG1PORT, and database, MBX1PORT, that will be used as part of the database portability operations. SG1PORT and MBX1PORT are separate from the SCR target's storage group and database files. Therefore, the paths for SG1PORT and MBX1PORT are configured with a temporary path that does not conflict with the SCR target paths. SG1PORT and MBX1PORT will only be used as database portability objects; the actual storage group and database files for SG1PORT are not needed. Because of this, the administrator dismounts MBX1PORT and deletes all of the files in the storage group. The storage group and database objects remain in Active Directory because they will be used later for database portability during the recovery process.
Activation and Recovery
An application event log entry indicates that the SCR source database is physically corrupt. Because SCR was enabled for SG1, the decision is made to perform a manual activation of the SCR target database for SG1 and to use database portability to restore data availability. Activation of the SCR target copy begins with dismounting MBX1 in SG1 using the following cmdlet:
Dismount-Database EXMBX1\SG1\MBX1
The SCR target database is then made viable for mounting, and the mailboxes originally on MBX1 will be rehomed to MBX1PORT.
The process for disabling SCR and making the SCR target database viable for mounting involves running the Restore-StorageGroupCopy cmdlet. This task marks the storage group's database as mountable and provides a report on the data loss, if any, that will result from mounting the database in the storage group. It also verifies that all of the log files generated by the active copy of the storage group are present in the passive copy's storage group file location. If any log files are missing, the operation will also try to copy them. The following cmdlet is used to make the SCR target database viable for mounting:
Restore-StorageGroupCopy EXMBX1\SG1
-StandbyMachine EXMBX2
In this example, the log files from the SCR source storage group are available for copying. If these files are not available (for example, because the SCR source computer is down), the Force parameter must be added to the Restore-StorageGroupCopy task. Otherwise, Restore-StorageGroupCopy will always attempt to connect to the SCR source to copy any missing log files, and if the SCR source is unavailable the Restore-StorageGroupCopy task will fail and the database will not be made viable for mounting. Adding the Force parameter tells the Restore-StorageGroupCopy task that the source files are not available, in which case it skips the connection attempt and proceeds to make the SCR target database viable for mounting.
After the Restore-StorageGroupCopy command has completed, the administrator must verify that the database is in a clean shutdown state. If the database is in a dirty shutdown state (see technet.microsoft.com/aa996757.aspx), it must be brought to a clean shutdown. If the log file prefix for the storage group (for example, E00 or E01) is the same for the SCR source (EXMBX1\SG1) and the storage group on the SCR target (SG1PORT) that will be used for database portability (EXMBX2\SG1PORT), running Eseutil in recovery mode is not necessary, and the final database mount operation will bring the database into a clean shutdown state after all of the replicated log files have been replayed. If the database is not in a clean shutdown state, Exchange Server Database Utilities (Eseutil) recovery mode (Eseutil /r) must be run against the database.
Once the database is in a clean shutdown state, you can run two commands that update Active Directory with new locations for the storage group files and database file. These commands change the paths for SG1PORT and MBX1 from their original paths to the paths for the staged storage group and database (SG1PORT and MBX1PORT):
Move-StorageGroupPath EXMBX2\SG1PORT
-SystemFolderPath C:\SG1 -LogFolderPath C:\SG1 –ConfigurationOnly
Move-DatabasePath EXMBX2\SG1PORT\MBX1PORT
-EdbFilePath C:\SG1\MBX1PORT.EDB
–ConfigurationOnly
The –ConfigurationOnly parameter must be included in the above commands so that only the configuration settings for these objects are updated in Active Directory. No data or files are moved, nor do they need to be because SCR has already replicated the data to C:\SG1 on EXMBX2.
The next step is to configure the database (MBX1PORT) to allow itself to be overwritten during a restore operation. This can be done by selecting the checkbox for the setting "This database can be overwritten by a restore" on the database object properties in the Exchange Management Console, or by using the following command in the Exchange Management Shell:
Set-MailboxDatabase EXMBX2\SG1PORT\MBX1PORT
-AllowFileRestore:$true
Once you have configured the database so that it will allow itself to be overwritten, the next step is to mount the database using the following command:
Mount-Database EXMBX2\SG1PORT\MBX1PORT
After the database is mounted, the mailboxes homed on the SCR source database (SG1\MBX1) must be rehomed to point to MBX1PORT on EXMBX2. Just run the Get-Mailbox cmdlet and pipeline the output to the Move-Mailbox cmdlet. During this process, it is important that the Exchange System Attendant and system mailboxes are not included in the output from the Get-Mailbox cmdlet that is piped to the Move-Mailbox cmdlet. Those mailbox objects do not need to be rehomed, nor should they be. The following command is run to rehome all user mailboxes and exclude system mailboxes:
Get-Mailbox -Database EXMBX1\SG1\MBX1 |where {$_.ObjectClass -NotMatch '(SystemAttendantMailbox|
ExOleDbSystemMailbox)'}| Move-Mailbox -ConfigurationOnly
-TargetDatabase EXMBX2\SG1PORT\MBX1PORT
At this point, client access to MBX1PORT is now possible. However, whether users can actually access their mailboxes after they have been moved from EXMBX1\SG1\MBX1 to EXMBX2\SG1PORT\MBX1PORT depends on Active Directory replication latency; that is, depending on the number of directory servers, it may take time for the update to propagate throughout the environment. In addition, client access methods matter. Messaging clients running Outlook 2007 and non-Outlook clients will have access to the user's mailbox after the directory servers used by the user's Client Access server have been updated with the new paths. Messaging clients running Outlook 2003 and earlier versions will require the user's desktop messaging profile to be updated with the new server name.
Final Steps
After clients have access to their mailboxes and mailbox data, the final step is to establish redundancy by re-enabling SCR. This is done by removing any remaining storage group and database files from EXMBX1. After the files have been removed, the paths for EXMBX1\SG1\MBX1 can be moved to a temporary location and EXMBX1 can become an SCR target of EXMBX2.
Scott Schnoll is a Principal Technical Writer on the Exchange Server Team at Microsoft, writing content for Exchange Server 2007. Prior to joining Microsoft, Scott wrote Exchange Server 2003 Distilled (Addison-Wesley Professional, 2004) and was the lead author for Exchange 2000 Server: The Complete Reference (McGraw-Hill/OsborneMedia, 2001).
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.