Understanding Basic Features in CCR

 

Cluster continuous replication (CCR) is a high availability feature of Microsoft Exchange Server 2007 that combines the asynchronous log shipping and replay technology built into Exchange 2007 with the failover and management features provided by the Cluster service.

CCR is designed to provide high availability for Exchange 2007 Mailbox servers by providing a solution that:

  • Has no single point of failure.
  • Has no special hardware requirements.
  • Has no shared storage requirements.
  • Can be deployed in one or two datacenter configurations.
  • Can reduce full backup frequency, reduce total backed up data volume, and shorten the service level agreement (SLA) for recovery time from first failure.

CCR uses the database failure recovery functionality in Exchange 2007 to enable the continuous and asynchronous updating of a second copy of a database with the changes that have been made to the active copy of the database. During installation of the passive node in a CCR environment, each storage group and its database is copied from the active node to the passive node. This operation is called seeding, and it provides a baseline of the database for replication. After the initial seeding is performed, log copying and replay are performed continuously.

In a CCR environment, the replication capabilities are integrated with the Cluster service to deliver a high availability solution. In addition to providing data and service availability, CCR also provides for scheduled outages. When updates need to be installed or when maintenance needs to be performed, an administrator can move a clustered mailbox server (called an Exchange Virtual Server in previous versions of Exchange Server) manually to a passive node. After the move operation is complete, the administrator can then perform the needed maintenance.

 Main Features in CCR:

 

Circular Logging in CCR:

In CCR circular logging called continuous replication circular logging (CRCL), which is different from the ESE circular logging

Set-StorageGroup -Identity "First Storage Group" -CircularLoggingEnabled $true

In a CCR, LCR, or SCR environment, you should always use the following process to enable or disable circular logging:

  1. Suspend-StorageGroupCopy
  2. Enable or disable circular logging
  3. Dismount and then mount the database
  4. Resume-StorageGroupCopy

ESE circular logging is performed and managed by the Microsoft Exchange Information Store service, CRCL is performed and managed by the Microsoft Exchange Replication Service

Microsoft Exchange Replication Service manages CRCL so that log continuity is maintained, and logs are not deleted by the log deleter if they are still needed for replication. Therefore, enabling CRCL should not negatively affect replication.

 

Transport Dumpster:

       Transport dumpster is a feature built into Exchange 2007 designed to minimize data loss by redelivering recently submitted messages back to the mailbox server after a lossy failure.

       The Transport Dumpster is configured by default. You can view the transport dumpster settings by running get-TransportConfig

       The transport dumpster is not enabled for SCR or (SCC).

       The transport dumpster is enabled for CCR and (LCR).

       In an LCR environment, the request for redelivery from all Hub Transport servers in the site occurs as part of the Restore-StorageGroupCopy task.

 

File share witness:

  • The file share witness uses a file share on a computer outside the cluster to act as a witness to the activities of the two nodes that are the cluster.
  • The file share for the file share witness can be hosted on any computer running Windows Server.
  • There is no requirement that the version of the Windows Server operating system hosting the file share match the operation system of the CCR environment. However, we recommend that you use a Hub Transport server in the Active Directory directory service
  • site containing the clustered mailbox server to host the file share, because this allows a messaging administrator to maintain control over the file share.
  • a single server can provide file shares for multiple CCR environments. However, each CCR environment should have its own dedicated folder and share on this server

 

Replication Compenents:

The two key services responsible for log generation, log shipping, and log replay activity are:

Microsoft Exchange Information Store service:

  • Responsible for servicing user and application requests, performing write ahead logging, and updating the database file via Extensible Storage Engine (ESE).
  • An operation occurs against the database (a client sends a new message), and the page that requires updating is read from the database file and placed into the ESE cache (assuming the page is not already in memory), while the log buffer is notified and it records the operation in memory.
  • The changes are recorded by the database engine but these changes are not immediately written to the database file on disk. Instead, these changes are held in the ESE cache and they are known as dirty pages because they have not been committed to the database file. The version store is used to keep track of these changes, thus ensuring that isolation and consistency are maintained.
  • As the database pages are changed, the log buffer is notified to commit the change, and the transaction is recorded in a transaction log file, which may or may not require closing the current Exx.log file and starting a new log generation. (Note that ESE is also responsible for closing a log file once it reaches its maximum file size (1 MB) and starting a new generation.)
  • Eventually the dirty database pages are written to the database file on disk.
    The checkpoint is advanced.

 

Microsoft Exchange Replication Service

  • Responsible for log shipping and replay against the database copy.
  • HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\MSExchange Repl\Diagnostics
  • When continuous replication is enabled, the Microsoft Exchange Replication Service is responsible for detecting when the current log file is closed by ESE, copying it, inspecting it, and replaying it into the database copy. This service is installed by default on all servers that have the Mailbox server role installed.
  • The executable for the Replication service is Microsoft.Exchange.Cluster.ReplayService.exe, which is located by default at <install path>\bin. The Replication service depends on the Microsoft Exchange Active Directory Topology Service.
  • The Replication service stores its diagnostic logging level setting in the Windows registry in the following location: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\MSExchange Repl\Diagnostics

 

Replication Service Compenents:

LogCopier

Responsible for copying closed log files from the source storage group to the target server that contains a copy of the storage group

LogInspector

Responsible for verifying that the log files are valid. It checks the destination inspector directory on a regular basis. If a log file is found to be corrupted or unusable for replay, the Replication service will request a re-copy of the file. 

LogReplayer

Responsible for replaying inspected log files into the database copy.

LogTruncater 

  • Responsible for deleting log files that have been successfully replayed into the database copy.
  • When continuous replication is used, the LogTruncator only deletes log files that are not needed for recovery and replay.
  • Any log files on the active copy that have not been replicated and replayed into the database copy are not deleted by an online backup of the active copy.

 

Storage Group Structure in CCR:

Share Name:

<Storage Group GUID>$

Folder Path:

<drive>:\<Storage Group Log Folder Path>

Share Permissions:

<root domain>\Exchange Servers – Read

Folder Permissions:

<rootdomain>\Exchange Servers – Read

<computer>\Administrators – Full Control

SYSTEM – Full Control

 

Transaction Log Replication And Replay:

  • Transaction log replication and replay is used to copy the changed data and update the passive copy's database.
  • The size of each generated log file is 1 MB
  • The transaction log file folder on the active node is shared using a standard Windows file share. (GUID) for the storage group is used for the share name, and ($) is added to the end of the share.
  • The Microsoft Exchange Replication service on the passive node connects to the share on active node using SMB protocol
  • The replication functionality copies the log files to the passive node as each log file is generated.
  • When the logs arrive at the passive node, they are checked for corruption and then replayed into the copy of the database that is stored on the passive node. The replay process makes the changes described in the change log to the passive node's database, which makes the passive node's database match the production database
  • The Microsoft Exchange Replication service on the passive node connects to the share on the active node and copies, or pulls, the log files using the Server Message Block (SMB) protocol. The passive node then verifies the log file and replays it into the copy of the database on the passive node.

 

 LLR and Auto Database Mount Dial:

In the event that an unscheduled outage occurs that affects the active node, the passive node will bring the clustered mailbox server instance online and the Replication service on that node will attempt to copy the missing log files from the node that experienced the failure. If the copy process is successful (for example, because the server is online and the shares and necessary data are accessible), then the storage groups will mount and there will be zero data loss. If the copy process is unsuccessful, then the databases will be mounted based on the clustered mailbox server’s AutoDatabaseMountDial setting and how far behind it is in log replication per storage group. There are three possible values for the server setting AutoDatabaseMountDial.

Lossless   Lossless is zero logs lost. When the attribute is set to Lossless, the system waits for the failed node to come back online before databases are mounted. Even then the failed system must return with all logs accessible and not corrupted. After the failure, the passive node is made active, and the Microsoft Exchange Information Store service is brought online. It checks to determine whether the databases can be mounted without any data loss. If possible, the databases are mounted. If they cannot be automatically mounted, the system periodically attempts to copy the logs. If the server returns with its logs intact, this attempt will eventually succeed, and the databases will mount. If the server returns without its logs intact, the remaining logs will not be available, and the affected databases will not mount automatically. In this event, administrative action is required to force the database to mount when logs are lost.

Good availability   Good availability is three logs lost. Good availability provides fully automatic recovery when replication is operating normally and replicating logs at the rate they are being generated.

Best availability   Best availability is six logs lost, which is the default setting. Best availability operates similarly to Good availability, but it allows automatic recovery when the replication experiences slightly more latency. Thus, the new active node might be slightly farther behind the state of the old active node after the failover, thereby increasing the likelihood that database divergence occurs, which requires a full reseed to correct.

 

Lost Log Resilience:

The order of write operations of Exchange data is always memory, log file, and then database file. LLR works by delaying writes to the database until the specified number of log generations have been created.

LLR = 10  (Exchange 2007 SP1, CCR)

LLR =  1  (Exchange 2007 SP1 , SCC or LCR or Standalone Mailbox)

LLR = AutoDatabaseMountDial + 1  (Exchange 2007 RTM)

In a lossy failure, there is at least one log file missing per storage group (Exx.log), but there also could be additional closed log files missing. This means that if the databases are brought online, data stored on NodeA (the failed node) is different than the data being generated on NodeB. This is referred to as divergence.

Divergence is when a storage group copy has information that is not in the active storage group. Divergence can be in the database or in the log files and can be caused by lossy failures,

Divergence is problematic because it means that data has been lost. In particular, database divergence is the worst case because it guarantees the need to reseed, which can be an expensive operation in terms of time and possibly bandwidth. Log file divergence also means data has been lost. However, log file divergence doesn’t necessarily cause database divergence because of LLR.

Remember that the order of write operations of Exchange data is always memory, log file, and then database file. LLR works by delaying writes to the database until the specified number of log generations have been created

LLR only runs on the active storage group copy. If you analyze the passive copy, you will see that its database is always up-to-date (in terms of the log files that exist on the passive node).