Case Study: How Microsoft Deploys Disaster Recovery for FIM 2010
Applies To: Forefront Identity Manager 2010
Disaster recovery involves restoring your systems and data in the event of partial or complete failure of computers due to natural or technical causes. Backing up the critical data in your Forefront Identity Manager (FIM) 2010 deployment is a necessary operational task for all organizations.
As an example, this document describes how Microsoft IT (MSIT) deploys FIM 2010 internally, its design for disaster recovery, and on how it recovers from hardware failures.
MSIT Environment and Topology
MSIT Hardware Specifications
Supported Resources
Planning for Disaster Recovery
Recovering From Hardware Failures
Backup, Copy and Restore
MSIT Environment and Topology
The following illustration shows the current topology used by MSIT to deploy FIM 2010.
As illustrated above, MSIT uses two servers that both host an instance of the FIM Portal and the FIM Service. To help with load balancing, one server is dedicated to responding to client requests (for example, employee requests to create or join groups) and the other server is dedicated for administration. The FIM Service database and the FIM Synchronization database are deployed to separate servers. For more information about topology planning for FIM 2010, see the Pre-Planning and Topology Configuration Guide.
MSIT Hardware Specifications
The following table displays the details of the hardware used by MSIT for their FIM 2010 deployment. For more information about hardware planning for FIM 2010, see the Capacity Planning Guide.
Server Role | Details |
---|---|
FIM Portal and FIM Service (Production) |
8 CPU Cores 8 GB RAM |
FIM Portal and FIM Service for Administration and Migration (Production) |
8 CPU Cores 8 GB RAM |
FIM Database (Production) |
24 CPU Cores 64 GB RAM |
FIM Synchronization Database and FIM Service (Production) |
24 CPU Cores 64 GB RAM |
FIM Portal and Service, FIM Database (Disaster Recovery) |
24 CPU Cores 64 GB RAM |
FIM FIM Synchronization Database and FIM Service (Disaster Recovery) |
24 CPU Cores 64 GB RAM |
Supported Resources
The following table displays the resources that are currently supported by the MSIT FIM 2010 deployment.
Resource | Approximate Size or Count |
---|---|
Number of Users |
200,000 |
Number of Groups |
465,000 (includes both Distribution and Security groups) |
Number of Distribution Groups |
275,000 |
FIM Service Database |
380,540 MB Note This is the database size, not the backed up file size. |
FIM Synchronization Database |
83,968 MB Note This is the database size, not the backed up file size. |
Number of management agents |
10 |
Planning for Disaster Recovery
Every organization will determine its own business requirements for recovery in a service level agreement (SLA). MSIT has a business requirement to recover FIM 2010 within 24 hours in the event of a disaster. To meet these requirements, the following steps are taken on a nightly basis:
Full Backups are performed.
Full Backups are copied to the disaster recovery site.
Full Backups are restored on the disaster recovery site.
In addition to these steps, it may be necessary to perform one or more Full Synchronizations (depending on which data source is authoritative).
Recovering From Hardware Failures
Disaster recovery may not always involve a failure of the entire deployment. Hardware failures can affect individual servers in the current production environment. MSIT has a number of tools for dealing with individual hardware failures, such as server imaging tools. However, because MSIT currently participates in the internal testing of FIM 2010 and receives new updates on an on-going basis, if necessary, it will just reinstall specific FIM 2010 components. The following table lists the average time to reinstall a FIM 2010 component.
FIM Component | Time to Re-install or update |
---|---|
FIM Portal and FIM Service |
20 minutes Note Assuming that user and group accounts already exist. |
FIM FIM Synchronization Service |
20 minutes Note Assuming that user and group accounts already exist. |
Backup, Copy and Restore
The following table describes the time it takes to copy and restore the MSIT FIM 2010 databases from the recovery site, and to perform a full synchronization of a typical management agent.
Operation | Average time to perform |
---|---|
Copy from backup location |
One hour |
Time to restore |
FIM Synchronization: 15 minutes FIM Service: 30 minutes |
Full synchronization of a management agent with a large number of affected resources (approximately 500,000 – 750,000). |
20 hours |