Cluster to Cluster Storage Replication

 

Updated: January 19, 2016

Applies To: Windows Server Technical Preview

Cluster-to-cluster replication is now available in Windows Server 2016 Technical Preview, including the replication of clusters using Storage Spaces Direct (i.e. shared nothing, direct attached storage). The management and configuration is similar to server-to-server replication.

You will configure these computers and storage in a cluster-to-cluster configuration, where one cluster replicates its own set of storage with another cluster and its set of storage. These nodes and their storage should be located in separate physical sites, although it is not required.

There are no graphical tools in Windows Server 2016 Technical Preview that can configure Storage Replica for cluster-to-cluster replication.

Important

In this test, the four servers are an example. You can use any number of servers supported by Microsoft in each cluster.

If you wish to use Storage Spaces Direct, you will need a minimum of four nodes per cluster, for a total of eight. This guide does not cover configuring Storage Spaces Direct. For information about configuring Storage Spaces Direct, see Storage Spaces Direct in Windows Server 2016 Technical Preview.

Terms

This walkthrough uses the following environment as an example:

  • Two member servers, named SR-SRV01 and SR-SRV02 in a cluster named SR-SRVCLUSA.

  • Two member servers named SR-SRV03 and SR-SRV04 in a cluster named SR-SRVCLUSB.

  • A pair of logical “sites” that represent two different data centers, with one called Redmond and one called Bellevue.

Cluster to Cluster Replication

FIGURE: Cluster to Cluster Replication

Prerequisites

  • Two sets of storage, using SAS JBODs, Fibre Channel SAN, or iSCSI Target. The storage should contain a mix of HDD and SSD media. You will make each storage set available only to each of the clusters, with no shared access between clusters.

  • At least one 1GbE connection on each file server, preferably 10GbE, iWARP, or InfiniBand.

  • Appropriate firewall and router rules to allow ICMP, SMB (port 445, plus 5445 for SMB Direct) and WS-MAN (port 5985) bi-directional traffic between all nodes.

  • A network between the two sets of servers with at least 1Gbps throughput (preferably 8Gbps or higher) and average of ≤5ms round trip latency.

  • Membership in the built-in Administrators group on all server nodes.

Many of these requirements can be determined by using the Test-SRTopology cmdlet. You get access to this tool if you install Storage Replica or the Storage Replica Management Tools features on at least one server. There is no need to configure Storage Replica to use this tool, only to install the cmdlet. More information is included in the steps below.

Provision operating system, features, roles, storage, and network

Warning

Windows Server 2016 Technical Preview does not support Storage Replica on production servers.

  1. Install Windows Server 2016 Technical Preview on all four server nodes with an installation type of Windows Server 2016 Technical Preview (Server with Desktop Experience). Do not choose Standard Edition if it is available, as it does not contain Storage Replica.

  2. On each node in the CMD prompt, run the SConfig tool. Add network information and join them to the domain, then restart them.

    Important

    From this point on, always logon as a domain user who is a member of the built-in administrator group on all servers. Always remember to elevate your Windows PowerShell and CMD prompts going forward when running on a Full graphical server installation or on a Windows 10 computer.

  3. Connect first set of JBOD storage enclosure, iSCSI target, FC SAN, or local fixed disk (DAS) storage to the server in site Redmond.

  4. Connect second set of storage to the server in site Bellevue.

  5. As appropriate, install latest vendor storage and enclosure firmware and drivers, latest vendor HBA drivers, latest vendor BIOS/UEFI firmware, latest vendor network drivers, and latest motherboard chipset drivers on all four nodes. Restart nodes as needed.

    Note

    Consult your hardware vendor documentation for configuring shared storage and networking hardware.

  6. Ensure that BIOS/UEFI settings for servers enable high performance, such as disabling C-State, setting QPI speed, enabling NUMA, and setting highest memory frequency. Ensure power management in Windows Server is set to high performance. Restart as required.

  7. Configure roles as follows:

    • Graphical method

      1. Run ServerManager.exe and create a server group, adding all server nodes.

      2. Install the File Server and Storage Replica roles and features on each of the nodes and restart them.

    • Windows PowerShell method

      On SR-SRV04 or a remote management computer, run the following command in a Windows PowerShell console to install the required features and roles for a stretch cluster on the four nodes and restart them:

      $Servers = 'SR-SRV01','SR-SRV02','SR-SRV03','SR-SRV04'
      
      $Servers | ForEach { Install-WindowsFeature –ComputerName $_ –Name Storage-Replica,Failover-Clustering,Multipath-IO,FS-FileServer –IncludeManagementTools -restart }
      

      For more information on these steps, see Install or Uninstall Roles, Role Services, or Features

  8. On all nodes, stop and disable the Windows Search service using Services.msc.

    Warning

    In Windows Server 2016 Technical Preview, this service is installed by default and due to a known issue, will cause problems with clustering and Storage Replica. Do not skip this step.

  9. Configure storage as follows:

    Important

    • You must create two volumes on each enclosure: one for data and one for logs.

    • Log and data disks must be initialized as GPT, not MBR.

    • The two data volumes must be of identical size.

    • The two log volumes should be of identical size.

    • All replicated data disks must have the same sector sizes.

    • All log disks must have the same sector sizes.

    • The log volumes should use flash-based storage, such as SSD.

    • The data disks can use HDD, SSD, or a tiered combination and can use either mirrored or parity spaces or RAID 1 or 10, or RAID 5 or RAID 50.

    • The data volume should be no larger than 10TB (for a first test, we recommend no more than 1TB, in order to lower initial replication sync times).

    • The log volume must be at least 8GB and may need to be larger based on log requirements.

    • For JBOD enclosures:

      1. Ensure that each cluster can see that site’s storage enclosures only and that the SAS connections are correctly configured.

      2. Provision the storage using Storage Spaces by following Steps 1-3 provided in the Deploy Storage Spaces on a Stand-Alone Server using Windows PowerShell or Server Manager.

    • For iSCSI Target storage:

      1. Ensure that each cluster can see that site’s storage enclosures only. You should use more than one single network adapter if using iSCSI.

      2. Provision the storage using your vendor documentation. If using Windows-based iSCSI Targeting, consult iSCSI Target Block Storage, How To.

    • For FC SAN storage:

      1. Ensure that each cluster can see that site’s storage enclosures only and that you have properly zoned the hosts.

      2. Provision the storage using your vendor documentation.

  10. Start Windows PowerShell and use the Test-SRTopology cmdlet to determine if you meet all the Storage Replica requirements.

    1. For example, to validate the proposed nodes that each have a F: and G: volume and run the test for 30 minutes:

      MD c:\temp
      
      Test-SRTopology -SourceComputerName SR-SRV01 -SourceVolumeNames f: -SourceLogVolumeName g: -DestinationComputerName SR-SRV03 -DestinationVolumeNames f: -DestinationLogVolumeName g: -DurationInMinutes 30 -ResultPath c:\temp
      

      Important

      When using a test server with no write IO load on the specified source volume during the evaluation period, consider adding a workload or it will not generate a useful report. You should test with production-like workloads in order to see real numbers and recommended log sizes. Alternatively, simply copy some files into the source volume during the test or download and run DISKSPD to generate write IOs. For instance, a sample with a low write IO workload for five minutes to the D: volume:

      Diskspd.exe -c1g –d300 -W5 -C5 -b8k -t2 -o2 -r –w5 –h d:\test.dat

  11. Examine the TestSrTopologyReport.html report to ensure that you meet the Storage Replica requirements.

Configure two Scale-Out File Server Failover Clusters

You will now create two normal failover clusters. After configuration, validation, and testing, you will replicate them using Storage Replica. You can perform all of the steps below on the cluster nodes directly or from a remote management computer that contains the Windows Server 2016 Technical Preview RSAT management tools.

Graphical method

  1. Run cluadmin.msc against a node in each site.

  2. Validate the proposed cluster and analyze the results to ensure you can continue. The example used below are SR-SRVCLUSA and SR-SRVCLUSB.

  3. Create the two clusters. Ensure that the cluster names are 15 characters or fewer.

  4. Configure a File Share Witness or Cloud Witness.

    Note

    Windows Server 2016 Technical Preview now includes an option for Cloud (Azure)-based Witness. You can choose this quorum option instead of the file share witness.

    Warning

    For more information about quorum configuration, see the Witness Configuration section in Configure and Manage the Quorum in a Windows Server 2012 Failover Cluster. For more information on the Set-ClusterQuorum cmdlet, see Set-ClusterQuorum.

  5. Add one disk in the Redmond site to the cluster CSV. To do so, right click a source disk in the Disks node of the Storage section, and then click Add to Cluster Shared Volumes.

  6. Create the clustered Scale-Out File Servers on both clusters using the instructions in Configure Scale-Out File Server 

Windows PowerShell method

  1. Test the proposed cluster and analyze the results to ensure you can continue:

    Test-Cluster SR-SRV01,SR-SRV02
    Test-Cluster SR-SRV03,SR-SRV04
    
  2. Create the clusters (you must specify your own static IP addresses for the clusters). Ensure that each cluster name is 15 characters or fewer:

    New-Cluster -Name SR-SRVCLUSA -Node SR-SRV01,SR-SRV02 -StaticAddress <your IP here>
    New-Cluster -Name SR-SRVCLUSB -Node SR-SRV03,SR-SRV04 -StaticAddress <your IP here>
    
  3. Configure a File Share Witness or Cloud (Azure) witness in the cluster that points to a share hosted on the domain controller or some other independent server. For example:

    Set-ClusterQuorum -FileShareWitness \\someserver\someshare
    

    Note

    Windows Server 2016 Technical Preview now includes an option for Cloud (Azure)-based Witness. You can choose this quorum option instead of the file share witness.

    Warning

    For more information about quorum configuration, see the Witness Configuration section in Configure and Manage the Quorum in a Windows Server 2012 Failover Cluster guide’s. For more information on the Set-ClusterQuorum cmdlet, see Set-ClusterQuorum.

  4. Create the clustered Scale-Out File Servers on both clusters using the instructions in Configure Scale-Out File Server 

Configure Cluster to Cluster Replication using Windows PowerShell

Now you will configure cluster-to-cluster replication using Windows PowerShell. You can perform all of the steps below on the nodes directly or from a remote management computer that contains the Windows Server 2016 Technical Preview RSAT management tools

  1. Grant the first cluster full access to the other cluster by running the Grant-ClusterAccess cmdlet on any node in the first cluster, or remotely.

    Grant-SRAccess -ComputerName SR-SRV01 –Cluster SR-SRVCLUSB 
    
  2. Grant the second cluster full access to the other cluster by running the Grant-ClusterAccess cmdlet on any node in the second cluster, or remotely.

    Grant-SRAccess -ComputerName SR-SRV03 –Cluster SR-SRVCLUSA
    
  3. Configure the cluster-to-cluster replication, specifying the source and destination disks, the source and destination logs, the source and destination cluster names, and the log size. You can perform this command locally on the server or using a remote management computer.

    New-SRPartnership -SourceComputerName SR-SRVCLUSA -SourceRGName rg01 -SourceVolumeName c:\ClusterStorage\Volume2 -SourceLogVolumeName f: -DestinationComputerName SR-SRVCLUSB -DestinationRGName rg02 -DestinationVolumeName c:\ClusterStorage\Volume2 -DestinationLogVolumeName f: 
    

    Warning

    The default log size is 8GB. Depending on the results of the Test-SRTopology cmdlet, you may decide to use –LogSizeInBytes with a higher or lower value.

  4. To get replication source and destination state, use Get-SRGroup and Get-SRPartnership as follows:

    Get-SRGroup
    Get-SRPartnership
    (Get-SRGroup).replicas
    
  5. Determine the replication progress as follows:

    1. On the source server, run the following command and examine events 5015, 5002, 5004, 1237, 5001, and 2200:

      Get-WinEvent -ProviderName Microsoft-Windows-StorageReplica –max 20
      
    2. On the destination server, run the following command to see the Storage Replica events that show creation of the partnership. This event states the number of copied bytes and the time taken. Example:

      Get-WinEvent -ProviderName Microsoft-Windows-StorageReplica –max 1 | Where-Object {$_.ID -eq "1215"} | fl
      
      Log Name:      Microsoft-Windows-StorageReplica/Operational
      Source:        Microsoft-Windows-StorageReplica
      Date:          4/13/2015 6:00:13 PM
      Event ID:      1215
      Task Category: (1)
      Level:         Information
      Keywords:      (1)
      User:          SYSTEM
      Computer:      sr-srv03.corp.contoso.com
      Description:
      Bitmap recovery completed successfully for replica.
      
      ReplicationGroupName: Replication 2
      ReplicationGroupId: {9d4a9a2a-747a-487f-a68b-5be13c6d7542}
      ReplicaName: \\?\Volume{67d83739-b33e-46bd-8007-516804946d09}\
      ReplicaId: {d9f2f467-e7e5-401f-8892-6f4d04cfbdbf}
      End LSN in bitmap: 
      LogGeneration: {00000000-0000-0000-0000-000000000000}
      LogFileId: 0
      CLSFLsn: 0xFFFFFFFF
      Number of Bytes Recovered: 10701766656
      Elapsed Time (ms): 32306
      
    3. Alternately, the destination server group for the replica states the number of byte remaining to copy at all times, and can be queried through PowerShell. For example:

      (Get-SRGroup).Replicas | Select-Object numofbytesremaining
      

      As a progress sample (that will not terminate):

      while($true) {
      
       $v = (Get-SRGroup -Name "Replication 2").replicas | Select-Object numofbytesremaining
       [System.Console]::Write("Number of bytes remaining: {0}`r", $v.numofbytesremaining)
       Start-Sleep -s 5
      }
      
    4. On the destination server in the destination cluster, run the following command and examine events 5009, 1237, 5001, 5015, 5005, and 2200 to understand the processing progress. There should be no warnings of errors in this sequence. There will be many 1237 events; these indicate progress.

      Get-WinEvent -ProviderName Microsoft-Windows-StorageReplica | FL
      

      Note

      The destination cluster disk will always show as Online (No Access) when replicated.

Manage replication

Now you will manage and operate your cluster-to-cluster replication. You can perform all of the steps below on the cluster nodes directly or from a remote management computer that contains the Windows Server 2016 Technical Preview RSAT management tools.

  1. Use Get-ClusterGroup or Failover Cluster Manager to determine the current source and destination of replication and their status.

  2. To measure replication performance, use the Get-Counter cmdlet on both the source and destination nodes. The counter names are:

    • \Storage Replica Partition I/O Statistics(*)\Number of times flush paused

    • \Storage Replica Partition I/O Statistics(*)\Number of pending flush I/O

    • \Storage Replica Partition I/O Statistics(*)\Number of requests for last log write

    • \Storage Replica Partition I/O Statistics(*)\Avg. Flush Queue Length

    • \Storage Replica Partition I/O Statistics(*)\Current Flush Queue Length

    • \Storage Replica Partition I/O Statistics(*)\Number of Application Write Requests

    • \Storage Replica Partition I/O Statistics(*)\Avg. Number of requests per log write

    • \Storage Replica Partition I/O Statistics(*)\Avg. App Write Latency

    • \Storage Replica Partition I/O Statistics(*)\Avg. App Read Latency

    • \Storage Replica Statistics(*)\Target RPO

    • \Storage Replica Statistics(*)\Current RPO

    • \Storage Replica Statistics(*)\Avg. Log Queue Length

    • \Storage Replica Statistics(*)\Current Log Queue Length

    • \Storage Replica Statistics(*)\Total Bytes Received

    • \Storage Replica Statistics(*)\Total Bytes Sent

    • \Storage Replica Statistics(*)\Avg. Network Send Latency

    • \Storage Replica Statistics(*)\Replication State

    • \Storage Replica Statistics(*)\Avg. Message Round Trip Latency

    • \Storage Replica Statistics(*)\Last Recovery Elapsed Time

    • \Storage Replica Statistics(*)\Number of Flushed Recovery Transactions

    • \Storage Replica Statistics(*)\Number of Recovery Transactions

    • \Storage Replica Statistics(*)\Number of Flushed Replication Transactions

    • \Storage Replica Statistics(*)\Number of Replication Transactions

    • \Storage Replica Statistics(*)\Max Log Sequence Number

    • \Storage Replica Statistics(*)\Number of Messages Received

    • \Storage Replica Statistics(*)\Number of Messages Sent

    For more information on performance counters in Windows PowerShell, see Get-Counter.

  3. To move the replication direction from one site, use the Set-SRPartnership cmdlet.

    Set-SRPartnership -NewSourceComputerName SR-SRVCLUSB -SourceRGName rg02 -DestinationComputerName SR-SRVCLUSA -DestinationRGName rg01
    

    Note

    Windows Server 2016 Technical Preview does not prevent role switching when initial sync is ongoing, which can lead to data loss if you attempt to switch before allowing initial replication to complete. Do not switch directions until initial sync is complete.

    Check the event logs to see the direction of replication change and recovery mode occur, and then reconcile. Write IOs can then write to the storage owned by the new source server. Changing the replication direction will block write IOs on the previous source computer.

    Note

    The destination cluster disk will always show as Online (No Access) when replicated.

  4. To change the log size from the default 8GB in Windows Server 2016 Technical Preview, use Set-SRGroup on both the source and destination Storage Replica groups.

    Important

    The default log size is 8GB. Depending on the results of the Test-SRTopology cmdlet, you may decide to use –LogSizeInBytes with a higher or lower value.

  5. To remove replication, use Get-SRGroup, Get-SRPartnership, Remove-SRGroup, and Remove-SRPartnership on each cluster.

    Get-SRPartnership | Remove-SRPartnership
    Get-SRGroup | Remove-SRGroup
    

    Note

    Storage Replica dismounts the destination volumes and their drive letters or mount points. This is by design.

See Also