Inside SharePointProtect SharePoint Data
Pav Cherny
Contents
SharePoint Data Protection Levels
Adopting a Componentized Approach
Building a High-Availability Front-End Subsystem
Back-End Subsystem Data Protection Options
Recovery Toolset
Best Practices
Protecting data through backup, restore and disaster recovery strategies is not a new approach; yet SharePoint presents unique challenges that require looking beyond typical disk backup, load balancing and redundancy approaches.
The data protection challenges in SharePoint stem from the way the SharePoint architecture stores content and configuration data on front-end and back-end servers. For example, most content resides in SQL Server or Shared Service Provider (SSP) databases, and most configuration data resides in the configuration database. However, SharePoint also depends on front-end server components such as binary files, Registry hive and customizations.
To address the SharePoint data protection challenges in a way that fits your environment and needs, you need to consider the acceptable downtime (also called the recovery time objective, or RTO), and the amount of data loss that can occur between the last restorable backup and point of data loss (also called the recovery point objective, or RPO). Keep in mind that as the RTO and RPO decrease, costs tend to increase. If some of the data is not mission-critical or more static in nature, you can even define multiple RTO and RPO targets based on service-level agreements (SLAs). You also need to consider the dependencies that contribute to RTO and RPO, such as datacenter distribution and WAN connectivity.
In this article, we will consider the data-protection strategies and options available with SharePoint. They range from low-cost, high-RTO and RPO approaches such as built-in recycle bin functionality, to more advanced high-availability options such as mirroring, clustering and Data Protection Manager (DPM), which can help you achieve low RTO and RPO targets. It is not possible to fit in all potential scenarios and architectures in this article, so I will focus on the most common issues and how to resolve them.
SharePoint Data Protection Levels
The traditional approach in existing documentation that deals with data protection is to separate SharePoint into three logical tiers, based on how often data is typically recovered at each one. The first tier is for data whose recovery is often user driven, such as accidental deletions or corruption of documents, lists and sites. Site administrators or end users perform the maintenance tasks to restore this data.
The second tier is for more advanced backup and restore tasks done by administrators at the farm level, including data restoration and business continuity in case of hardware failure, migrations or database operations.
The third tier, commonly referred to as disaster recovery, entails designing the infrastructure in such as away as to ensure high availability by using redundancies and eliminating single points of failure. Figure 1 shows the three tiers in more detail.
Figure 1 Traditional Data Protection Tiers
Although this way of looking at SharePoint data protection through tiers is helpful in designing a plan for backup, restore and high availability, as well as matching incidents to resolution steps for support technicians, it does not necessarily provide a best practices-based set of tools and topologies.
Admittedly, it's difficult to accommodate every single environment with a few topologies; but if flexible enough, they will accommodate most configurations. I prefer to build in redundancy and high availability as much as possible, and will give a few high availability-based topology options later.
Using an approach that starts with server and topology design emphasizing front-end and back-end subsystems as the logical tiers, and incorporates high-availability best practices, you can build an environment with data protection built in.
In other words, instead of looking at the data in three tiers and responding to disasters or recovery requests, you can mitigate the risk of data loss by creating a fitting design that eliminates single points of failure and includes standardized server configurations. I consider both front-end IIS servers and SharePoint application servers (such as crawl/indexing) as part of the front-end subsystem.
Adopting a Componentized Approach
When looking at data-protection strategies for SharePoint, I always try to look at Microsoft for guidance and best practices. I don't mean just the product documentation, but also the internal Microsoft IT organization that handles design, deployment and operations.
Microsoft IT has an ongoing initiative to simplify all elements of its infrastructure and standardize operations practices to the point where more than 80 percent of issues can be resolved by front-line support technicians and desk-side support. This initiative includes SharePoint infrastructure and data protection.
One of the key ideas in Microsoft IT's initiative was to create standard configuration blocks, consisting of physical and logical components that could be rapidly deployed to scale up and out. Figure 2 shows an example of these configuration blocks.
Figure 2 SharePoint deployment and infrastructure blocks.
Another approach I learned from Microsoft IT is to componentize data protection according to where the data resides. Practically, this entails thinking of front-end blocks and back-end blocks as distinct elements, building in high availability to the degree that budget and resources allow, and eliminating backup/restore processes through high availability. For example, by using load-balanced front-end servers and storing all custom data in the content and configuration databases, you eliminate the need to back up the binary and system state data of front-end servers. After all, they're part of replaceable building blocks.
Building a High-Availability Front-End Subsystem
At first glance, the idea of high availability may seem expensive or excessive, especially when a traditional "backup to tape" approach has worked in the past. Yet, at a basic level, it is not costly or difficult to implement redundancy for front-end and application servers. It's much more costly when tape fails or when there is extended downtime.
At the very basic level, a building block for the front-end subsystem consists of four servers. Two run IIS in a load-balanced configuration, and two run application roles such as Excel Calculation Services, Project Server or indexing. If you prefer, you can think of the load-balanced IIS servers as one block and application servers as another block.
In this configuration, the need to back up data is not eliminated by default. If you have customizations to web.config or other files, you must keep a copy of them. You must also keep documentation of configuration settings, including those for IIS and for SharePoint, such as Alternate Access Mappings (AAM). You can back up IIS metabase, XML master and configuration files and use a documented process for reconfiguring a standard deployment, including installing SSL certificates. The good news is that after documenting and organizing customizations, you can deploy replacement blocks rapidly and plug them in.
Back-End Subsystem Data Protection Options
As already mentioned, SharePoint holds most of its content and configuration details in SQL Server databases, and it is possible to eliminate the need for front-end backups by using a componentized approach with standardized configurations. If you eliminate the need to back up front-end servers, you can focus on the SQL Server databases.
Because SharePoint relies so heavily on SQL Server, the backup considerations move away from what is more directly SharePoint-driven to the options that you have for data protection with SQL Server.
At the SharePoint level as it deals with SQL Server, the options are in the UI for backup and restore operations; at the SQL Server level, you start with design considerations and then consider how they affect backup and restore procedures.
The first option for data protection in SQL Server is a snapshot, which is a read-only, static version at a point in time of a database. Snapshots may be useful for reverting to a previous source database version, in case of user or administrator error, for example. For more information about snapshots, see msdn.microsoft.com/en-us/library/ms187054%28SQL.90%29.aspx.
The second option is log shipping, which relies on transaction log files transferred to a standby server from an active, primary server. If the primary server fails, you can manually fail over operations to the standby server and restore shipped logs.
It is important to note that there is no automatic failover with log shipping, and the recovery time may be lengthy due to the steps involved to restore the logs and bring up the standby server. For more information, see msdn.microsoft.com/en-us/library/ms187103.aspx.
The third option is mirroring, which works on top of log shipping and is available with SQL Server 2005 and later. Mirroring works with databases that use the full recovery model and supports quick failover to a hot standby server. There are three modes: high safety, high performance and high availability. The most common mode for SharePoint environments is high availability. This mode supports automatic failover through a witness server. Mirroring is faster than log shipping because you do not have to manually restore logs. For more information, see go.microsoft.com/fwlink/?LinkId=83725&clcid=0x409.
The fourth and fifth options are replication and clustering. Replication is useful when you have multiple geographically distributed datacenters because it enables you to copy data to a centralized main datacenter. This provides some data protection, but is more used for performance gains in remote locations.
Failover clustering takes the idea of componentization further by separating the storage subsystem from the OS. With clusters, multiple physical servers rely on a shared storage system such as a storage-area network (SAN). (It is important to note that clustering does not eliminate the storage subsystem as a single point of failure.)
The most common high-availability solutions are clustering and mirroring. Log shipping is used when downtime is acceptable or when there are budget constraints. The big tradeoffs between mirroring and clustering are that clustering often uses a common storage subsystem, whereas mirroring protects only explicitly mirrored databases. For more information, see technet.microsoft.com/en-us/library/ms179410.aspx.
Recovery Toolset
Even with a solid high-availability design for the front-end and back-end subsystems, you will face scenarios that require data recovery. For example, users or administrators may accidentally delete lists, documents, sites or even make changes at the farm level that affect the availability of the entire farm. And the unforeseeable may happen that makes restoration of service impossible, such as natural disasters. Many tools are available to help with these types of operations, including:
- Versioning: This is a feature of document libraries that works on documents and lists. It's included as part of the SharePoint data-protection functionality, but disabled by default. Versioning does not support folders, webs or sites. If you enable versioning, users can access previous versions without your direct intervention.
- Recycle Bin: Users can use this capability from a site to recover lists, documents and document libraries, folders and list items. You can also access the recycle bin as a site administrator at the site-collection level. This feature is configured at the Web application level.
- Microsoft IT Site Delete Capture: This tool uses the SPWebEventÂReceiver.WebDeleting method to provide you with a recycle bin-like feature at the site level. The SharePoint object model includes the WebDeleting method to enable you to create custom backup capabilities for sites. For more information, see msdn2.microsoft.com/en-us/library/microsoft.sharepoint.spwebeventreceiver.webdeleting.aspx
- Stsadm.exe: This command-line tool gives you a lot of flexibility for restoring content. You can back up and restore site collections, satabses, Web applications or the entire farm. For more information about the commands and context, see technet.microsoft.com/en-us/library/cc263441.aspx.
- Central Administration Backup: This is a GUI tool included in SharePoint that you can use to back up the farm, content database or Web application.
- Data Protection Manager (DPM): Part of the Microsoft System Center Suite, DPM protects SharePoint data by making backups every 15 minutes, and enables you to restore to any granularity, such as farm, site collection, site, document library and so on. One of the greatest benefits DPM provides is the ability to restore items directly without pre-staging or mounting a backed-up version of a database.
- Third-Party Tools: Using third-party tools can be tricky if you expect them to perform flawlessly in all scenarios. You should consider how well the tools integrate with SharePoint and Windows Server Volume Shadow Copy Service functionality, as well as any post-backup/restore operations required. Many are available from such companies as Quest, AvePoint and Commvault.
Best Practices
With a componentized approach to protecting SharePoint data, you can combine design elements with backup and recovery tools to help meet RTO and RPO objectives. For example, it is possible to eliminate the need to back up front-end servers entirely by using a single standardized configuration for all servers. Other best practices include:
- Have separate backup and recovery tools and procedures. Although some tools provide backup and restore functionality, you should consider them separate activities. This is because high-availability design options and backup granularity do not necessarily correspond to the types of restorations you will do. Also, backup and restore tasks have overhead and take time to complete. By planning for these separately, you have a great ability to control the environment.
- Document. At the very least, you should document the configurations of your servers, topology, SharePoint settings, IIS settings, and procedures for backup and restore operations. Place this documentation in a non-SharePoint location.
- Test and practice. Ensure that your plans and procedures work by practicing recovery tasks and verifying that data is recoverable.
- Simplify and centralize. Working with componentized elements is one way to simplify and standardize the configuration. In addition, explore more options, such as centralizing management, and simplifying datacenter topologies.
Pav Cherny is an IT expert and author specializing in Microsoft technologies for collaboration and unified communication. His publications include white papers, product manuals, and books with a focus on IT operations and system administration. Pav is President of Biblioso Corp., a company that specializes in managed documentation and localization services.