Set up disaster recovery for a multi-tier SharePoint application for disaster recovery using Azure Site Recovery
This article describes in detail how to protect a SharePoint application using Azure Site Recovery.
Overview
Microsoft SharePoint is a powerful application that can help a group or department organize, collaborate, and share information. SharePoint can provide intranet portals, document and file management, collaboration, social networks, extranets, websites, enterprise search, and business intelligence. It also has system integration, process integration, and workflow automation capabilities. Typically, organizations consider it as a Tier-1 application sensitive to downtime and data loss.
Today, Microsoft SharePoint does not provide any out-of-the-box disaster recovery capabilities. Regardless of the type and scale of a disaster, recovery involves the use of a standby data center that you can recover the farm to. Standby data centers are required for scenarios where local redundant systems and backups can't recover from the outage at the primary data center.
A good disaster recovery solution should allow modeling of recovery plans around the complex application architectures such as SharePoint. It should also have the ability to add customized steps to handle application mappings between various tiers and hence providing a single-click failover with a lower RTO in the event of a disaster.
This article describes in detail how to protect a SharePoint application using Azure Site Recovery. This article covers the best practices for replicating a three tier SharePoint application to Azure, how you can do a disaster recovery drill, and how you can fail over the application to Azure.
Prerequisites
Before you start, make sure you understand the following:
- Replicating a virtual machine to Azure
- How to design a recovery network
- Doing a test failover to Azure
- Doing a failover to Azure
- How to replicate a domain controller
- How to replicate SQL Server
SharePoint architecture
SharePoint can be deployed on one or more servers using tiered topologies and server roles to implement a farm design that meets specific goals and objectives. A typical large, high-demand SharePoint server farm that supports a high number of concurrent users and a large number of content items use service grouping as part of their scalability strategy. This approach involves running services on dedicated servers, grouping these services together, and then scaling out the servers as a group. The following topology illustrates the service and server grouping for a three tier SharePoint server farm. Refer to SharePoint documentation and product line architectures for detailed guidance on different SharePoint topologies. You can find more details about SharePoint 2013 deployment in this document.
Site Recovery support
Site Recovery is application-agnostic and should work with any version of SharePoint running on a supported machine. For this article, VMware virtual machines with Windows Server 2012 R2 Enterprise was used. SharePoint 2013 Enterprise edition and SQL server 2014 Enterprise edition were used.
Source and target
Scenario | To a secondary site | To Azure |
---|---|---|
Hyper-V | Yes | Yes |
VMware | Yes | Yes |
Physical server | Yes | Yes |
Azure | NA | Yes |
Things to keep in mind
If you're using a shared disk-based cluster as any tier in your application, then you wn't be able to use Site Recovery replication to replicate those virtual machines. You can use native replication provided by the application and then use a recovery plan to fail over all tiers.
Replicating virtual machines
Follow this guidance to start replicating the virtual machine to Azure.
Once the replication is complete, make sure you go to each virtual machine of each tier and select same availability set in Replicated item> Settings > Properties > Compute and Network. For example, if your web tier has 3 virtual machines, ensure all the 3 virtual machines are configured to be part of the same availability set in Azure.
For guidance on protecting Active Directory and DNS, refer to Protect Active Directory and DNS document.
For guidance on protecting database tier running on SQL server, refer to Protect SQL Server document.
Networking configuration
Network properties
For the App and Web tier virtual machines, configure network settings in Azure portal so that the virtual machines get attached to the right DR network after failover.
If you're using a static IP, then specify the IP that you want the virtual machine to take in the Target IP field
DNS and Traffic Routing
For internet facing sites, create a Traffic Manager profile of 'Priority' type in the Azure subscription. And then configure your DNS and Traffic Manager profile in the following manner.
Where | Source | Target |
---|---|---|
Public DNS | Public DNS for SharePoint sites Ex: sharepoint.contoso.com |
Traffic Manager contososharepoint.trafficmanager.net |
On-premises DNS | sharepointonprem.contoso.com | Public IP on the on-premises farm |
In the Traffic Manager profile, create the primary and recovery endpoints. Use the external endpoint for on-premises endpoint and public IP for Azure endpoint. Ensure that the priority is set higher to on-premises endpoint.
Host a test page on a specific port (for example, 800) in the SharePoint web tier in order for Traffic Manager to automatically detect availability post failover. This is a workaround in case you cannot enable anonymous authentication on any of your SharePoint sites.
Configure the Traffic Manager profile with the following settings:
- Routing method - 'Priority'
- DNS time to live (TTL) - '30 seconds'
- Endpoint monitor settings - If you can enable anonymous authentication, you can give a specific website endpoint. Or, you can use a test page on a specific port (for example, 800).
Creating a recovery plan
A recovery plan allows sequencing the failover of various tiers in a multi-tier application, hence, maintaining application consistency. Follow the given steps while creating a recovery plan for a multi-tier web application. Learn more about creating a recovery plan.
Adding virtual machines to failover groups
Create a recovery plan by adding the App and Web tier virtual machines.
Click on 'Customize' to group the virtual machines. By default, all virtual machines are part of 'Group 1'.
Create another Group (Group 2) and move the Web tier virtual machines into the new group. Your App tier virtual machines should be part of 'Group 1' and Web tier virtual machines should be part of 'Group 2'. This is to ensure that the App tier virtual machines boot up first followed by Web tier virtual machines.
Adding scripts to the recovery plan
You can deploy the most commonly used Azure Site Recovery scripts into your Automation account clicking the 'Deploy to Azure' button. When you're using any published script, ensure you follow the guidance in the script.
Add a pre action script to 'Group 1' to failover SQL Availability group. Use the
ASR-SQL-FailoverAG
script published in the sample scripts. Ensure you follow the guidance in the script and make the required changes in the script appropriately.Add a post action script to attach a load balancer on the failed over virtual machines of Web tier (Group 2). Use the
ASR-AddSingleLoadBalancer
script published in the sample scripts. Ensure you follow the guidance in the script and make the required changes in the script appropriately.Add a manual step to update the DNS records to point to the new farm in Azure.
For internet facing sites, no DNS updates are required post failover. Follow the steps described in the 'Networking guidance' section to configure Traffic Manager. If the Traffic Manager profile has been set up as described in the previous section, add a script to open dummy port (800 in the example) on the Azure virtual machine.
For internal facing sites, add a manual step to update the DNS record to point to the new Web tier virtual machine’s load balancer IP.
Add a manual step to restore search application from a backup or start a new search service.
For restoring Search service application from a backup, follow these steps:
- This method assumes that a backup of the Search Service Application was performed before the catastrophic event and that the backup is available at the DR site.
- This can easily be achieved by scheduling the backup (for example, once daily) and using a copy procedure to place the backup at the DR site. Copy procedures could include scripted programs such as AzCopy (Azure Copy) or setting up DFSR (Distributed File Services Replication).
- Now that the SharePoint farm is running, navigate the Central Administration, 'Backup and Restore' and select Restore. The restore interrogates the backup location specified (you may need to update the value). Select the Search Service Application backup you would like to restore.
- Search is restored. Keep in mind that the restore expects to find the same topology (same number of servers) and same hard drive letters assigned to those servers. For more information, see 'Restore Search service application in SharePoint 2013' document.
For starting with a new Search service application, follow these steps:
- This method assumes that a backup of the Search Administration database is available at the DR site.
- Since the other Search Service Application databases aren't replicated, they need to be re-created. To do so, navigate to Central Administration and delete the Search Service Application. On any servers which host the Search Index, delete the index files.
- Re-create the Search Service Application and this re-creates the databases. It is recommended to have a prepared script that re-creates this service application since it's not possible to perform all actions via the GUI. For example, setting the index drive location and configuring the search topology are only possible by using SharePoint PowerShell cmdlets. Use the Windows PowerShell cmdlet Restore-SPEnterpriseSearchServiceApplication and specify the log-shipped and replicated Search Administration database, Search_Service__DB. This cmdlet gives the search configuration, schema, managed properties, rules, and sources and creates a default set of the other components.
- Once the Search Service Application has be re-created, you must start a full crawl for each content source to restore the Search Service. You lose some analytics information from the on-premises farm, such as search recommendations.
Once all the steps are completed, save the recovery plan and the final recovery plan looks like following:
Doing a test failover
Follow this guidance to do a test failover.
- Go to Azure portal and select your Recovery Service vault.
- Select the recovery plan created for SharePoint application.
- Select Test Failover.
- Select recovery point and Azure virtual network to start the test failover process.
- Once the secondary environment is up, you can perform your validations.
- Once the validations are complete, you can click 'Cleanup test failover' on the recovery plan and the test failover environment is cleaned.
For guidance on doing test failover for AD and DNS, refer to Test failover considerations for AD and DNS document.
For guidance on doing test failover for SQL Always ON availability groups, refer to Performing Application DR with Azure Site Recovery and doing Test failover document.
Doing a failover
Follow this guidance for doing a failover.
- Go to Azure portal and select your Recovery Services vault.
- Click on the recovery plan created for SharePoint application.
- Click on 'Failover'.
- Select recovery point to start the failover process.
Next steps
You can learn more about replicating other applications using Site Recovery.