Exchange 2010 Tested Solutions: 9000 Mailboxes in Two Sites Running Hyper-V on Dell M610 Servers, Dell EqualLogic Storage, and F5 Load Balancing Solutions
Rob Simpson, Program Manager, Microsoft Exchange Server; Akshai Parthasarathy, Systems Engineer, Dell; Casey Birch, Product Marketing Manager for Exchange Solutions, Dell
December 2010
Summary
In Exchange 2010 Tested Solutions, Microsoft and participating server, storage, and network partners examine common customer scenarios and key design decision points facing customers who plan to deploy Microsoft Exchange Server 2010. Through this series of white papers, we provide examples of well-designed, cost-effective Exchange 2010 solutions deployed on hardware offered by some of our server, storage, and network partners.
You can download this document from the Microsoft Download Center.
Applies To
Microsoft Exchange Server 2010 release to manufacturing (RTM)
Microsoft Exchange Server 2010 with Service Pack 1 (SP1)
Windows Server 2008 R2
Windows Server 2008 R2 Hyper-V
Table of Contents
Introduction
Solution Summary
Customer Requirements
Mailbox Profile Requirements
Geographic Location Requirements
Server and Data Protection Requirements
Design Assumptions
Server Configuration Assumptions
Storage Configuration Assumptions
Solution Design
Determine High Availability Strategy
Estimate Mailbox Storage Capacity Requirements
Estimate Mailbox I/O Requirements
Determine Storage Type
Choose Storage Solution
Determine Number of EqualLogic Arrays Required
Estimate Mailbox Memory Requirements
Estimate Mailbox CPU Requirements
Determine Whether Server Virtualization Will Be Used
Determine Whether Client Access and Hub Transport Server Roles Will Be Deployed in Separate Virtual Machines
Determine Server Model for Hyper-V Root Server
Determine the CPU Capacity of the Virtual Machines
Determine Number of Mailbox Server Virtual Machines Required
Determine Number of Mailboxes per Mailbox Server
Determine Memory Required Per Mailbox Server
Determine Number of Client Access and Hub Transport Server Combo Virtual Machines Required
Determine Memory Required per Combined Client Access and Hub Transport Virtual Machines
Determine Virtual Machine Distribution
Determine Memory Required per Root Server
Determine Minimum Number of Databases Required
Identify Failure Domains Impacting Database Copy Layout
Design Database Copy Layout
Determine Storage Design
Determine Placement of the File Share Witness
Plan Namespaces
Determine Client Access Server Array and Load Balancing Strategy
Determine Hardware Load Balancing Solution
Determine Hardware Load Balancing Device Resiliency Strategy
Determine Hardware Load Balancing Methods
Solution Overview
Logical Solution Diagram
Physical Solution Diagram
Server Hardware Summary
Client Access and Hub Transport Server Configuration
Mailbox Server Configuration
Database Layout
Storage Hardware Summary
Storage Configuration
Network Switch Hardware Summary
Load Balancer Hardware Summary
Solution Validation Methodology
Storage Design Validation Methodology
Server Design Validation
Functional Validation Tests
Datacenter Switchover Validation
Primary Datacenter Service Restoration Validation
Storage Design Validation Results
Server Design Validation Results
Conclusion
Additional Information
Introduction
This document provides an example of how to design, test, and validate an Exchange Server 2010 solution for environments with 9,000 mailboxes deployed on Dell server and storage solutions and F5 load balancing solutions. One of the key challenges with designing Exchange 2010 environments is examining the current server and storage options available and making the right hardware choices that provide the best value over the anticipated life of the solution. Following the step-by-step methodology in this document, we walk through the important design decision points that help address these key challenges while ensuring that the customer's core business requirements are met. After we have determined the optimal solution for this customer, the solution undergoes a standard validation process to ensure that it holds up under simulated production workloads for normal operating, maintenance, and failure scenarios.
Return to top
Solution Summary
The following tables summarize the key Exchange and hardware components of this solution.
Exchange components
Exchange component | Value or description |
---|---|
Target mailbox count |
9000 |
Target mailbox size |
750 megabytes (MB) |
Target message profile |
103 messages per day |
Database copy count |
3 |
Volume Shadow Copy Service (VSS) backup |
None |
Site resiliency |
Yes |
Virtualization |
Hyper-V |
Exchange server count |
18 virtual machines (VMs) |
Physical server count |
9 |
Hardware components
Hardware component | Value or description |
---|---|
Server partner |
Dell |
Server model |
PowerEdge M610 |
Server type |
Blade |
Processor |
Intel Xeon X5550 |
Storage partner |
Dell EqualLogic |
Storage type |
Internet SCSI (iSCSI) storage area network (SAN) |
Disk type |
1 terabyte 7.2 kilobyte (KB) Serial ATA (SATA) 3.5" |
Return to top
Customer Requirements
One of the most important first steps in Exchange solution design is to accurately summarize the business and technical requirements that are critical to making the correct design decisions. The following sections outline the customer requirements for this solution.
Return to top
Mailbox Profile Requirements
Determine mailbox profile requirements as accurately as possible because these requirements may impact all other components of the design. If Exchange is new to you, you may have to make some educated guesses. If you have an existing Exchange environment, you can use the Microsoft Exchange Server Profile Analyzer tool to assist with gathering most of this information. The following tables summarize the mailbox profile requirements for this solution.
Mailbox count requirements
Mailbox count requirements | Value |
---|---|
Mailbox count (total number of mailboxes including resource mailboxes) |
9000 |
Projected growth percent (%) in mailbox count (projected increase in mailbox count over the life of the solution) |
0% |
Expected mailbox concurrency % (maximum number of active mailboxes at any time) |
100% |
Mailbox size requirements
Mailbox size requirements | Value |
---|---|
Average mailbox size in MB |
750 MB (742) |
Tiered mailbox size |
Yes 450 @ 4 gigabytes (GB) 900 @ 1 GB 7650 @ 512 MB |
Average mailbox archive size in MB |
0 |
Projected growth (%) in mailbox size in MB (projected increase in mailbox size over the life of the solution) |
included |
Target average mailbox size in MB |
750 MB |
Mailbox profile requirements
Mailbox profile requirements | Value |
---|---|
Target message profile (average total number of messages sent plus received per user per day) |
103 messages per day |
Tiered message profile |
Yes 450 @ 150 messages per day 8550 @ 100 messages per day |
Target average message size in KB |
75 |
% in MAPI cached mode |
100 |
% in MAPI online mode |
0 |
% in Outlook Anywhere cached mode |
0 |
% in Microsoft Office Outlook Web App (Outlook Web Access in Exchange 2007 and previous versions) |
0 |
% in Exchange ActiveSync |
0 |
Return to top
Geographic Location Requirements
Understanding the distribution of mailbox users and datacenters is important when making design decisions about high availability and site resiliency.
The following table outlines the geographic distribution of people who will be using the Exchange system.
Geographic distribution of people
Mailbox user site requirements | Value |
---|---|
Number of major sites containing mailbox users |
1 |
Number of mailbox users in site 1 |
9000 |
Number of mailbox users in site 2 |
0 |
The following table outlines the geographic distribution of datacenters that could potentially support the Exchange e-mail infrastructure.
Geographic distribution of datacenters
Datacenter site requirements | Value or description |
---|---|
Total number of datacenters |
2 |
Number of active mailboxes in proximity to datacenter 1 |
9000 |
Number of active mailboxes in proximity to datacenter 2 |
0 |
Requirement for Exchange to reside in more than one datacenter |
Yes |
Return to top
Server and Data Protection Requirements
It's also important to define server and data protection requirements for the environment because these requirements will support design decisions about high availability and site resiliency.
The following table identifies server protection requirements.
Server protection requirements
Server protection requirements | Value |
---|---|
Number of simultaneous server or VM failures within site |
1 |
Number of simultaneous server or VM failures during site failure |
0 |
The following table identifies data protection requirements.
Data protection requirements
Data protection requirement | Value or description |
---|---|
Requirement to maintain a backup of the Exchange databases outside of the Exchange environment (for example, third-party backup solution) |
No |
Requirement to maintain copies of the Exchange databases within the Exchange environment (for example, Exchange native data protection) |
Yes |
Requirement to maintain multiple copies of mailbox data in the primary datacenter |
Yes |
Requirement to maintain multiple copies of mailbox data in a secondary datacenter |
Yes |
Requirement to maintain a lagged copy of any Exchange databases |
No |
Lagged copy period in days |
Not applicable |
Target number of database copies |
3 |
Deleted Items folder retention window in days |
14 days |
Return to top
Design Assumptions
This section includes information that isn't typically collected as part of customer requirements, but is critical to both the design and the approach to validating the design.
Return to top
Server Configuration Assumptions
The following table describes the peak CPU utilization targets for normal operating conditions, and for site server failure or server maintenance conditions.
Server utilization targets
Target server CPU utilization design assumption | Value |
---|---|
Normal operating for Mailbox servers |
<70% |
Normal operating for Client Access servers |
<70% |
Normal operating for Hub Transport servers |
<70% |
Normal operating for multiple server roles (Client Access, Hub Transport, and Mailbox servers) |
<70% |
Normal operating for multiple server roles (Client Access and Hub Transport servers) |
<70% |
Node failure for Mailbox servers |
<80% |
Node failure for Client Access servers |
<80% |
Node failure for Hub Transport servers |
<80% |
Node failure for multiple server roles (Client Access, Hub Transport, and Mailbox servers) |
<80% |
Node failure for multiple server roles (Client Access and Hub Transport servers) |
<80% |
Return to top
Storage Configuration Assumptions
The following tables summarize some data configuration and input/output (I/O) assumptions made when designing the storage configuration.
Data configuration assumptions
Data configuration assumption | Value or description |
---|---|
Data overhead factor |
20% |
Mailbox moves per week |
1% |
Dedicated maintenance or restore logical unit number (LUN) |
No |
LUN free space |
20% |
Log shipping compression enabled |
Yes |
Log shipping encryption enabled |
Yes |
I/O configuration assumptions
I/O configuration assumption | Value or description |
---|---|
I/O overhead factor |
20% |
Additional I/O requirements |
None |
Return to top
Solution Design
The following section provides a step-by-step methodology used to design this solution. This methodology takes customer requirements and design assumptions and walks through the key design decision points that need to be made when designing an Exchange 2010 environment.
Return to top
Determine High Availability Strategy
When designing an Exchange 2010 environment, many design decision points for high availability strategies impact other design components. We recommend that you determine your high availability strategy as the first step in the design process. We highly recommend that you review the following information prior to starting this step:
Step 1: Determine whether site resiliency is required
If you have more than one datacenter, you must decide whether to deploy Exchange infrastructure in a single datacenter or distribute it across two or more datacenters. The organization's recovery service level agreements (SLAs) should define what level of service is required following a primary datacenter failure. This information should form the basis for this decision.
*Design Decision Point*
In this example, there is a service level agreement that requires the ability to restore the messaging service within four hours in the event of a primary datacenter failure. Therefore the customer must deploy Exchange infrastructure in a secondary datacenter for disaster recovery purposes.
Step 2: Determine relationship between mailbox user locations and datacenter locations
In this step, we look at whether all mailbox users are located primarily in one site or if they're distributed across many sites and whether those sites are associated with datacenters. If they're distributed across many sites and there are datacenters associated with those sites, you need to determine if there's a requirement to maintain affinity between mailbox users and the datacenter associated with that site.
*Design Decision Point*
In this example, all of the active users are located in one primary location. The primary location is in geographic proximity to the primary datacenter and therefore there's a desire for all active mailboxes to reside in the primary datacenter during normal operating conditions.
Step 3: Determine database distribution model
Because the customer has decided to deploy Exchange infrastructure in more than one physical location, the customer needs to determine which database distribution model best meets the needs of the organization. There are three database distribution models:
Active/Passive distribution Active mailbox database copies are deployed in the primary datacenter and only passive database copies are deployed in a secondary datacenter. The secondary datacenter serves as a standby datacenter and no active mailboxes are hosted in the datacenter under normal operating conditions. In the event of an outage impacting the primary datacenter, a manual switchover to the secondary datacenter is performed and active databases are hosted there until the primary datacenter returns online.
Active/Passive distribution
Active/Active distribution (single DAG) Active mailbox databases are deployed in the primary and secondary datacenters. A corresponding passive copy is located in the alternate datacenter. All Mailbox servers are members of a single database availability group (DAG). In this model, the wide area network (WAN) connection between two datacenters is potentially a single point of failure. Loss of the WAN connection results in Mailbox servers in one of the datacenters going into a failed state due to loss of quorum.
Active/Active distribution (single DAG)
Active/Active distribution (multiple DAGs) This model leverages multiple DAGs to remove WAN connectivity as a single point of failure. One DAG has active database copies in the first datacenter and its corresponding passive database copies in the second datacenter. The second DAG has active database copies in the second datacenter and its corresponding passive database copies in the first datacenter. In the event of loss of WAN connectivity, the active copies in each site continue to provide database availability to local mailbox users.
Active/Active distribution (multiple DAGs)
*Design Decision Point*
In this example, active mailbox users are only in a single location and only the secondary datacenter will be used in the event that the primary datacenter fails. Therefore, an Active/Passive distribution model is the obvious choice.
Step 4: Determine backup and database resiliency strategy
Exchange 2010 includes several new features and core changes that, when deployed and configured correctly, can provide native data protection that eliminates the need to make traditional data backups. Backups are traditionally used for disaster recovery, recovery of accidentally deleted items, long term data storage, and point-in-time database recovery. Exchange 2010 can address all of these scenarios without the need for traditional backups:
Disaster recovery In the event of a hardware or software failure, multiple database copies in a DAG enable high availability with fast failover and no data loss. DAGs can be extended to multiple sites and can provide resilience against datacenter failures.
Recovery of accidentally deleted items With the new Recoverable Items folder in Exchange 2010 and the hold policy that can be applied to it, it's possible to retain all deleted and modified data for a specified period of time, so recovery of these items is easier and faster. For more information, see Messaging Policy and Compliance, Understanding Recoverable Items, and Understanding Retention Tags and Retention Policies.
Long-term data storage Sometimes, backups also serve an archival purpose. Typically, tape is used to preserve point-in-time snapshots of data for extended periods of time as governed by compliance requirements. The new archiving, multiple-mailbox search, and message retention features in Exchange 2010 provide a mechanism to efficiently preserve data in an end-user accessible manner for extended periods of time. For more information, see Understanding Personal Archives, Understanding Multi-Mailbox Search, and Understanding Retention Tags and Retention Policies.
Point-in-time database snapshot If a past point-in-time copy of mailbox data is a requirement for your organization, Exchange provides the ability to create a lagged copy in a DAG environment. This can be useful in the rare event that there's a logical corruption that replicates across the databases in the DAG, resulting in a need to return to a previous point in time. It may also be useful if an administrator accidentally deletes mailboxes or user data.
There are technical reasons and several issues that you should consider before using the features built into Exchange 2010 as a replacement for traditional backups. Prior to making this decision, see Understanding Backup, Restore and Disaster Recovery.
*Design Decision Point*
In this example, maintaining tape backups has been difficult, and testing and validating restore procedures hasn't occurred on a regular basis. Therefore, using Exchange native data protection in place of traditional backups as the database resiliency strategy is preferred.
Step 5: Determine number of database copies required
There are a number of factors to consider when determining the number of database copies that you'll deploy. The first is whether you're using a third-party backup solution. In the previous step, this decision was made. We strongly recommend deploying a minimum of three copies of a mailbox database before eliminating traditional forms of protection for the database, such as Redundant Array of Independent Disks (RAID) or traditional VSS-based backups.
Prior to making this decision, see Understanding Mailbox Database Copies.
*Design Decision Point*
In the previous step, it was decided not to deploy a third-party backup solution. As a result, the design should have a minimum of three copies of each database. This ensures that both the recovery time objective and recovery point objective requirements are met.
Step 6: Determine database copy type
There are two types of database copies:
High availability database copy This database copy is configured with a replay lag time of zero. As the name implies, high availability database copies are kept up-to-date by the system, can be automatically activated by the system, and are used to provide high availability for mailbox service and data.
Lagged database copy This database copy is configured to delay transaction log replay for a period of time. Lagged database copies are designed to provide point-in-time protection, which can be used to recover from store logical corruptions, administrative errors (for example, deleting or purging a disconnected mailbox), and automation errors (for example, bulk purging of disconnected mailboxes).
*Design Decision Point*
In this example, all three mailbox database copies will be deployed as high availability database copies. The primary need for a lagged copy is to provide the ability to recover single deleted items. This requirement can be met using the deleted items retention feature.
Step 7: Determine number of database availability groups
A DAG is the base component of the high availability and site resilience framework built into Exchange 2010. A DAG is a group of up to 16 Mailbox servers that hosts a set of replicated databases and provides automatic database-level recovery from failures that affect individual servers or databases.
A DAG is a boundary for mailbox database replication, database and server switchovers and failovers, and for an internal component called Active Manager. Active Manager is an Exchange 2010 component, which manages switchovers and failovers. Active Manager runs on every server in a DAG.
From a planning perspective, you should try to minimize the number of DAGs deployed. You should consider going with more than one DAG if:
You deploy more than 16 Mailbox servers.
You have active mailbox users in multiple sites (active/active site configuration).
You require separate DAG-level administrative boundaries.
You have Mailbox servers in separate domains. (DAG is domain bound.)
*Design Decision Point*
In a previous step, it was decided that the database distribution model was going to be active/passive. This model doesn't require multiple DAGs to be deployed. This example isn't likely to require more than 16 Mailboxes servers for 10,000 mailboxes, and there is no requirement for separate DAG-level administrative boundaries. Therefore, a single DAG will be used in this design.
Step 8: Determine Mailbox server resiliency strategy
Exchange 2010 has been re-engineered for mailbox resiliency. Automatic failover protection is now provided at the mailbox database level instead of at the server level. You can strategically distribute active and passive database copies to Mailbox servers within a DAG. Determining how many database copies you plan to activate on a per-server basis is a key aspect to Exchange 2010 capacity planning. There are different database distribution models that you can deploy, but generally we recommend one of the following:
Design for all copies activated In this model, the Mailbox server role is sized to accommodate the activation of all database copies on the server. For example, a Mailbox server may host four database copies. During normal operating conditions, the server may have two active database copies and two passive database copies. During a failure or maintenance event, all four database copies would become active on the Mailbox server. This solution is usually deployed in pairs. For example, if deploying four servers, the first pair is servers MBX1 and MBX2, and the second pair is servers MBX3 and MBX4. In addition, when designing for this model, you will size each Mailbox server for no more than 40 percent of available resources during normal operating conditions. In a site resilient deployment with three database copies and six servers, this model can be deployed in sets of three servers, with the third server residing in the secondary datacenter. This model provides a three-server building block for solutions using an active/passive site resiliency model.
This model can be used in the following scenarios:
Active/Passive multisite configuration where failure domains (for example, racks, blade enclosures, and storage arrays) require easy isolation of database copies in the primary datacenter
Active/Passive multisite configuration where anticipated growth may warrant easy addition of logical units of scale
Configurations that aren't required to survive the simultaneous loss of any two Mailbox servers in the DAG
This model requires servers to be deployed in pairs for single site deployments and sets of three for multisite deployments. The following table illustrates a sample database layout for this model.
Design for all copies activated
In the preceding table, the following applies:
C1 = active copy (activation preference value of 1) during normal operations
C2 = passive copy (activation preference value of 2) during normal operations
C3 = passive copy (activation preference value of 3) during site failure event
Design for targeted failure scenarios In this model, the Mailbox server role is designed to accommodate the activation of a subset of the database copies on the server. The number of database copies in the subset will depend on the specific failure scenario that you're designing for. The main goal of this design is to evenly distribute active database load across the remaining Mailbox servers in the DAG.
This model should be used in the following scenarios:
All single site configurations with three or more database copies
Configurations required to survive the simultaneous loss of any two Mailbox servers in the DAG
The DAG design for this model requires between 3 and 16 Mailbox servers. The following table illustrates a sample database layout for this model.
Design for targeted failure scenarios
In the preceding table, the following applies:
C1 = active copy (activation preference value of 1) during normal operations
C2 = passive copy (activation preference value of 2) during normal operations
C3 = passive copy (activation preference value of 3) during normal operations
*Design Decision Point*
In a previous step, it was decided to deploy an Active/Passive database distribution model with two high availability database copies in the primary datacenter and one high availability copy in the secondary datacenter. Because the two high availability copies in the primary datacenter are usually deployed in separate hardware failure domains, this model usually results in a Mailbox server resiliency strategy that designs for all copies being activated.
Step 9: Determine number of Mailbox servers and DAGs
The number of Mailbox servers required to support the workload and the minimum number of Mailbox servers required to support the DAG design may be different. In this step, a preliminary result is obtained. The final number of Mailbox servers will be determined in a later step.
*Design Decision Point*
This example uses three high availability database copies. To support three copies, a minimum of three Mailbox servers in the DAG is required. In an active/passive configuration, two of the servers will reside in the primary datacenter, and the third server will reside in the secondary datacenter. In this model, the number of servers in the DAG should be deployed in multiples of three. The following table outlines the possible configurations.
Number of Mailbox servers and DAGs
Primary datacenter | Secondary datacenter | Total Mailbox server count |
---|---|---|
2 |
1 |
3 |
4 |
2 |
6 |
6 |
3 |
9 |
8 |
4 |
12 |
Return to top
Estimate Mailbox Storage Capacity Requirements
Many factors influence the storage capacity requirements for the Mailbox server role. For additional information, we recommend that you review Understanding Mailbox Database and Log Capacity Factors.
The following steps outline how to calculate mailbox capacity requirements. These requirements will then be used to make decisions about which storage solution options meet the capacity requirements. A later section covers additional calculations required to properly design the storage layout on the chosen storage platform.
Microsoft has created a Mailbox Server Role Requirements Calculator that will do most of this work for you. To download the calculator, see E2010 Mailbox Server Role Requirements Calculator. For additional information about using the calculator, see Exchange 2010 Mailbox Server Role Requirements Calculator.
Step 1: Calculate mailbox size on disk
Before attempting to determine what your total storage requirements are, you should know what the mailbox size on disk will be. A full mailbox with a 1-GB quota requires more than 1 GB of disk space because you have to account for the prohibit send/receive limit, the number of messages the user sends or receives per day, the Deleted Items folder retention window (with or without calendar version logging and single item recovery enabled), and the average database daily variations per mailbox. The Mailbox Server Role Requirements Calculator does these calculations for you. You can also use the following information to do the calculations manually.
The following calculations are used to determine the mailbox size on disk for the three mailbox tiers in this solution:
Tier 1 (512 MB mailbox quota, 100 messages per day message profile, 75 KB average message size)
Whitespace = 100 messages per day × 75 ÷ 1024 MB = 7.3 MB
Dumpster = (100 messages per day × 75 ÷ 1024 MB × 14 days) + (512 MB × 0.012) + (512 MB × 0.058) = 138 MB
Mailbox size on disk = mailbox limit + whitespace + dumpster
= 512 MB + 7.3 MB + 138 MB
= 657 MB
Tier 2 (1024 MB mailbox quota, 100 messages per day message profile, 75 KB average message size)
Whitespace = 100 messages per day × 75 ÷ 1024 MB = 7.3 MB
Dumpster = (100 messages per day × 75 ÷ 1024 MB × 14 days) + (1024 MB × 0.012) + (1024 MB × 0.058) = 174 MB
Mailbox size on disk = mailbox limit + whitespace + dumpster
= 1024 MB + 7.3 MB + 174 MB
= 1205 MB
Tier 3 (4096 MB mailbox quota, 150 messages per day message profile, 75 KB average message size)
Whitespace = 150 messages per day × 75 ÷ 1024 MB = 11 MB
Dumpster = (150 messages per day × 75 ÷ 1024 MB × 14 days) + (4096 MB × 0.012) + (4096 MB × 0.058) = 441 MB
Mailbox size on disk = mailbox limit + whitespace + dumpster
= 4096 MB + 11 MB + 441 MB
= 4548 MB
Average size on disk = [(657 × 7650) + (1205 × 900) + (4548 × 450)] ÷ 9000
= 907 MB
Step 2: Calculate database storage capacity requirements
In this step, the high level storage capacity required for all mailbox databases is determined. The calculated capacity includes database size, catalog index size, and 20 percent free space.
To determine the storage capacity required for all databases, use the following formulas:
Tier 1 (512 MB mailbox quota, 100 messages per day message profile, 75 KB average message size)
Database size = (number of mailboxes × mailbox size on disk × database overhead growth factor) × (20% data overhead)
= (7650 × 657 × 1) × 1.2
= 6031260 MB
= 5890 GB
Database index size = 10% of database size
= 589 GB
Total database capacity = (database size + index size) ÷ 0.80 to add 20% volume free space
= (5890 + 589) ÷ 0.8
= 8099 GB
Tier 2 (1024 MB mailbox quota, 100 messages per day message profile, 75 KB average message size)
Database size= (number of mailboxes × mailbox size on disk × database overhead growth factor) x (20% data overhead)
= (900 × 1205 × 1) x 1.2
= 1301400 MB
=1271 GB
Database index size = 10% of database size
= 127 GB
Total database capacity = (database size + index size) ÷ 0.80 to add 20% volume free space
= (1271 + 127) ÷ 0.8
= 1747 GB
Tier 3 (4096 MB mailbox quota, 150 messages per day message profile, 75 KB average message size)
Database size = (number of mailboxes × mailbox size on disk × database overhead growth factor) x (20% data overhead)
= (450 × 4548 × 1) x 1.2
= 2455920 MB
= 2400 GB
Database index size = 10% of database size
= 240 GB
Total database capacity = (database size + index size) ÷ 0.80 to add 20% volume free space
= (2400+ 240) ÷ 0.8
= 3301 GB
Total database capacity (all tiers) = 8099 + 1747 + 3301
= 13147 GB
= 12.3 terabytes
Step 3: Calculate transaction log storage capacity requirements
To ensure that the Mailbox server doesn't sustain any outages as a result of space allocation issues, the transaction logs also need to be sized to accommodate all of the logs that will be generated during the backup set. Provided that this architecture is leveraging the mailbox resiliency and single item recovery features as the backup architecture, the log capacity should allocate for three times the daily log generation rate in the event that a failed copy isn't repaired for three days. (Any failed copy prevents log truncation from occurring.) In the event that the server isn't back online within three days, you would want to temporarily remove the copy to allow truncation to occur.
To determine the storage capacity required for all transaction logs, use the following formulas:
Tier 1 (512 MB mailbox quota, 100 messages per day message profile, 75 KB average message size)
Log files size = (log file size × number of logs per mailbox per day × number of days required to replace failed infrastructure × number of mailbox users) + (1% mailbox move overhead)
= (1 MB × 20 × 3 × 7650) + (7650 × 0.01 × 512)
= 498168 MB
= 487 GB
Total log capacity = log files size ÷ 0.80 to add 20% volume free space
= (487) ÷ 0.80
= 608 GB
Tier 2 (1024 MB mailbox quota, 100 messages per day message profile, 75 KB average message size)
Log files size = (log file size × number of logs per mailbox per day × number of days required to replace failed infrastructure × number of mailbox users) + (1% mailbox move overhead)
= (1 MB × 20 × 3 × 900) + (900 × 0.01 × 1024)
= 63216 MB
= 62 GB
Total log capacity = log files size ÷ 0.80 to add 20% volume free space
= (62) ÷ 0.80
= 77 GB
Tier 3 (4096 MB mailbox quota, 150 messages per day message profile, 75 KB average message size)
Log files size = (log file size × number of logs per mailbox per day × number of days required to replace failed infrastructure × number of mailbox users) + (1% mailbox move overhead) = (1 MB × 30 × 3 × 450) + (450 × 0.01 × 4096)
= 58932 MB
= 58 GB
Total log capacity = log files size ÷ 0.80 to add 20% volume free space
= (58) ÷ 0.80
= 72 GB
Total log capacity (all tiers) = 608 + 77 + 72
= 757 GB
Step 4: Determine total storage capacity requirements
The following table summarizes the high level storage capacity requirements for this solution. In a later step, you will use this information to make decisions about which storage solution to deploy. You will then take a closer look at specific storage requirements in later steps.
Summary of storage capacity requirements
Disk space requirements | Value |
---|---|
Average mailbox size on disk (MB) |
907 |
Database space required (GB) |
13147 |
Log space required (GB) |
757 |
Total space required (GB) |
13904 |
Total space required for three database copies (GB) |
41712 |
Total space required for three database copies (terabytes) |
41 |
Return to top
Estimate Mailbox I/O Requirements
When designing an Exchange environment, you need an understanding of database and log performance factors. We recommend that you review Understanding Database and Log Performance Factors.
Calculate total mailbox I/O requirements
Because it's one of the key transactional I/O metrics needed for adequately sizing storage, you should understand the amount of database I/O per second (IOPS) consumed by each mailbox user. Pure sequential I/O operations aren't factored in the IOPS per Mailbox server calculation because storage subsystems can handle sequential I/O much more efficiently than random I/O. These operations include background database maintenance, log transactional I/O, and log replication I/O. In this step, you calculate the total IOPS required to support all mailbox users, using the following:
Note
To determine the IOPS profile for a different message profile, see the table "Database cache and estimated IOPS per mailbox based on message activity" in Understanding Database and Log Performance Factors.
Total required IOPS = IOPS per mailbox user × number of mailboxes × I/O overhead factor
= 0.15 × 450 × 1.2
= 81
Total required IOPS (all tiers) = 1107
Average IOPS per mailbox = 1107 ÷ 9000 = 0.123
The high level storage IOPS requirements are approximately 1107. When choosing a storage solution, ensure that the solution meets this requirement.
Return to top
Determine Storage Type
Exchange 2010 includes improvements in performance, reliability, and high availability that enable organizations to run Exchange on a wide range of storage options.
When examining the storage options available, being able to balance the performance, capacity, manageability, and cost requirements is essential to achieving a successful storage solution for Exchange.
For more information about choosing a storage solution for Exchange 2010, see Mailbox Server Storage Design.
Determine whether you prefer an internal or external storage solution
A number of server models on the market today support from 8 through 16 internal disks. These servers are a fit for some Exchange deployments and provide a solid solution at a low price point. If your storage capacity and I/O requirements are met with internal storage and you don't have a specific requirement to use external storage, you should consider using server models with an internal disk for Exchange deployments. If your storage and I/O requirements are higher or your organization has an existing investment in SANs, you should examine larger external direct-attached storage (DAS) or SAN solutions.
*Design Decision Point*
In this example, the external storage solution is selected.
Return to top
Choose Storage Solution
Use the following steps to choose a storage solution.
Step 1: Identify preferred storage vendor
In this solution, the preferred storage vendor is Dell.
Dell, Inc. is a leading IT infrastructure and services company with a broad portfolio of servers, storage, networking products, and comprehensive service offerings. Dell also provides testing, best practices, and architecture guidance specifically for Exchange 2010 and other Microsoft-based solutions in the unified communications and collaboration stack such as Microsoft Office SharePoint Server and Office Communications Server.
Dell offers a wide variety of storage solutions from Dell EqualLogic, Dell PowerVault, and Dell/EMC. Dell storage technologies help you minimize cost and complexity, increase performance and reliability, simplify storage management, and plan for future growth.
Step 2: Review available options from preferred vendor
There are a number of storage options that would be a good fit for this solution. The following options were considered:
Option 1: Dell EqualLogic PS 6000 Series iSCSI SAN Array
The Dell EqualLogic PS Series is fundamentally changing the way enterprises think about purchasing and managing storage. Built on breakthrough virtualized peer storage architecture, the EqualLogic PS Series simplifies the deployment and administration of consolidated storage environments. Its all-inclusive, intelligent feature set streamlines purchasing and delivers rapid SAN deployment, easy storage management, comprehensive data protection, enterprise-class performance and reliability, and seamless pay-as-you grow expansion. The PS6000 is a 3u chassis that contains sixteen 3.5 inch hard disk drives with two iSCSI controllers and four 1 GB-Ethernet ports per controller. Up to 16 arrays can be included in a single managed unit known as a group.
Option 2: Dell EqualLogic PS 6500 Series iSCSI SAN Array
The Dell EqualLogic PS Series 6500 arrays also provide the same ease of use and intelligence features. However, this array was built with maximum density in mind. This 4u chassis holds up to 48 3.5 inch hard disk drives, making it incredibly space efficient. It also contains four 1 GB-Ethernet ports per controller. The PS6500 can be mixed with other PS series arrays in the same group.
Dell EqualLogic PS Series 6500 array
Components | Dell EqualLogic PS6000E, X, XV, and XVS | Dell EqualLogic PS6500E and X |
---|---|---|
Storage controllers: |
Dual controllers with a total of 4 GB-battery-backed memory. Battery-backed memory provides up to 72 hours of data protection. |
Dual controllers with a total of 4 GB-battery-backed memory. Battery-backed memory provides up to 72 hours of data protection. |
Hard disk drives: |
16x SATA, SAS, or SSD. |
48x SATA or SAS. |
Volumes |
Up to 1024. |
Up to 1024. |
RAID support |
RAID-5, RAID-6, RAID-10, and RAID-50. |
RAID-5, RAID-6, RAID-10, and RAID-50. |
Network interfaces |
4 copper per controller. |
4 copper per controller. |
Reliability |
Redundant, hot-swappable controllers, power supplies, cooling fans, and disks. Individual disk drive slot power control. |
Redundant, hot-swappable controllers, power supplies, cooling fans, and disks. Individual disk drive slot power control. |
Option 3: Dell PowerVault MD3200i iSCSI SAN Array
The PowerVault MD3200i is a high performance iSCSI SAN designed to deliver storage consolidation and data management capabilities in an easy to use, cost effective solution. Shared storage is required to enable VM mobility, which is the key benefit of a virtual environment. The PowerVault MD3000i is a networked shared storage solution, providing the high availability, expandability, and ease of management desired in virtual environments. The PowerVault MD3000i leverages existing IP networks and offers small and medium businesses an easy to use iSCSI SANs without the need for extensive training or new expensive infrastructures.
Step 3: Select an array
The listed arrays were the PS 6000E and the PS 6500E. PS6500E enclosures can accommodate a total of 46 + 2 (hot spare) drives and are the most dense storage solution offered. Therefore, the cost per gigabyte of deploying a PS6500E solution would be lower than that for a PS6000E solution. The PS6500E array is also an intelligent solution that offers SAN configuration and monitoring features, auto-build of RAID sets, network sensing mechanisms, and continuous health monitoring. The MD3200i is a less expensive solution but lacks some of the management and deployment features in the PS series arrays.
In this example, the PS6500 series is selected because this storage enclosure offers a comprehensive datacenter consolidation solution spread across multiple sites as opposed to a Server Message Block (SMB) or branch-office storage need.
Step 4: Select a disk type
The Exchange 2010 solution is optimized to use more sequential I/O and less random I/O with larger mailboxes. This implies less disk intensive activity, even during peak usage hours when compared to Exchange 2007. Therefore, high capacity SATA disks are used to save cost.
For a list of supported disk types, see "Physical Disk Types" in Understanding Storage Configuration.
To help determine which disk type to choose, see "Factors to Consider When Choosing Disk Types" in Understanding Storage Configuration.
Return to top
Determine Number of EqualLogic Arrays Required
In a previous step, it was determined to deploy three copies of each database. One of the three copies will be located in the secondary datacenter. Therefore, to meet the site resiliency requirements, a minimum of one PS6500E in the primary datacenter and one PS6500E in the secondary datacenter is needed.
Consider IOPS requirements. In a previous step it was determined that 1,107 IOPS were required to support the 9,000 mailboxes. For a RAID-10 configuration of SATA disks, this IOPS requirement can be met in a single PS 6500 array. In a failure event, a single PS6500E would have to support 100 percent of the IOPS requirement. Therefore, to meet the IOPS requirements, a minimum of one PS6500E in the primary datacenter and one PS6500E in the secondary datacenter is needed.
Consider storage capacity requirements. In a previous step, it was determined that approximately 26 terabytes were required to support two copies of each database in the primary datacenter and approximately 13 terabytes to support one copy of each database in the secondary datacenter. A single PS6500E configured with two spares and the remaining 46 disks in a RAID-10 disk group provides approximately 20 terabytes. Therefore, two PS6500E's in the primary datacenter and one PS6500E in the secondary datacenter are required to support the capacity requirements.
Three PS6500E's will be deployed to support the capacity requirements of this solution.
Return to top
Estimate Mailbox Memory Requirements
Sizing memory correctly is an important step in designing a healthy Exchange environment. We recommend that you review Understanding Memory Configurations and Exchange Performance and Understanding the Mailbox Database Cache.
Calculate required database cache
The Extensible Storage Engine (ESE) uses database cache to reduce I/O operations. In general, the more database cache available, the less I/O generated on an Exchange 2010 Mailbox server. However, there's a point where adding additional database cache no longer results in a significant reduction in IOPS. Therefore, adding large amounts of physical memory to your Exchange server without determining the optimal amount of database cache required may result in higher costs with minimal performance benefit.
The IOPS estimates that you completed in a previous step assume a minimum amount of database cache per mailbox. These minimum amounts are summarized in the table "Estimated IOPS per mailbox based on message activity and mailbox database cache" in Understanding the Mailbox Database Cache.
The following table outlines the database cache per user for various message profiles.
Database cache per user
Messages sent or received per mailbox per day (about 75 KB average message size) | Database cache per user (MB) |
---|---|
50 |
3 MB |
100 |
6 MB |
150 |
9 MB |
200 |
12 MB |
In this step, you determine high level memory requirements for the entire environment. In a later step, you use this result to determine the amount of physical memory needed for each Mailbox server. Use the following information:
Tier 1 (512 MB mailbox quota, 100 messages per day message profile, 75 KB average message size)
Database cache = profile specific database cache × number of mailbox users
= 6 MB × 7650
= 45900 MB
= 45 GB
Tier 2 (1024 MB mailbox quota, 100 messages per day message profile, 75 KB average message size)
Database cache = profile specific database cache × number of mailbox users
= 6 MB × 900
= 5400 MB
= 6 GB
Tier 3 (4096 MB mailbox quota, 150 messages per day message profile, 75 KB average message size)
Database cache = profile specific database cache × number of mailbox users
= 9 MB × 450
= 4050 MB
= 4 GB
Total database cache (all tiers) = 55 GB
Average per active mailbox = 55 GB ÷ 9000 × 1024 = 6.2 MB
The total database cache requirements for the environment are 55 GB or 6.2 MB per mailbox user.
Return to top
Estimate Mailbox CPU Requirements
Mailbox server capacity planning has changed significantly from previous versions of Exchange due to the new mailbox database resiliency model provided in Exchange 2010. For additional information, see Mailbox Server Processor Capacity Planning.
In the following steps, you calculate the high level megacycle requirements for active and passive database copies. These requirements will be used in a later step to determine the number of Mailbox servers needed to support the workload. Note that the number of Mailbox servers required also depends on the Mailbox server resiliency model and database copy layout.
Using megacycle requirements to determine the number of mailbox users that an Exchange Mailbox server can support isn't an exact science. A number of factors can result in unexpected megacycle results in test and production environments. Megacycles should only be used to approximate the number of mailbox users that an Exchange Mailbox server can support. It's always better to be conservative rather than aggressive during the capacity planning portion of the design process.
The following calculations are based on published megacycle estimates as summarized in the following table.
Megacycle estimates
Messages sent or received per mailbox per day | Megacycles per mailbox for active mailbox database | Megacycles per mailbox for remote passive mailbox database | Megacycles per mailbox for local passive mailbox |
---|---|---|---|
50 |
1 |
0.1 |
0.15 |
100 |
2 |
0.2 |
0.3 |
150 |
3 |
0.3 |
0.45 |
200 |
4 |
0.4 |
0.6 |
Step 1: Calculate active mailbox CPU requirements
In this step, you calculate the megacycles required to support the active database copies, using the following:
Tier 1 (512 MB mailbox quota, 100 messages per day message profile, 75 KB average message size)
Active mailbox megacycles required = profile specific megacycles × number of mailbox users
= 2 × 7650
= 15300
Tier 2 (1024 MB mailbox quota, 100 messages per day message profile, 75 KB average message size)
Active mailbox megacycles required = profile specific megacycles × number of mailbox users
= 2 × 900
= 1800
Tier 3 (4096 MB mailbox quota, 150 messages per day message profile, 75 KB average message size)
Active mailbox megacycles required = profile specific megacycles × number of mailbox users
= 3 × 450
= 1350
Total active mailbox megacycles required (all tiers) = 18450 megacycles
Step 2: Calculate active mailbox remote database copy CPU requirements
In a design with three copies of each database, there is processor overhead associated with shipping logs required to maintain database copies on the remote servers. This overhead is typically 10 percent of the active mailbox megacycles for each remote copy being serviced. Calculate the requirements, using the following:
Tier 1 (512 MB mailbox quota, 100 messages per day message profile, 75 KB average message size)
Remote copy megacycles required = profile specific megacycles × number of mailbox users × number of remote copies
= (0.2) × (7650) × 2
= 3060
Tier 2 (1024 MB mailbox quota, 100 messages per day message profile, 75 KB average message size)
Remote copy megacycles required = profile specific megacycles × number of mailbox users × number of remote copies
= (0.2) × (900) × 2
= 360
Tier 3 (4096 MB mailbox quota, 150 messages per day message profile, 75 KB average message size)
Remote copy megacycles required = profile specific megacycles × number of mailbox users × number of remote copies
= (0.3) × (450) × 2
= 270
Total remote copy megacycles required (all tiers) = 3690
Step 3: Calculate local passive mailbox CPU requirements
In a design with three copies of each database, there is processor overhead associated with maintaining the local passive copies of each database. In this step, the high level megacycles required to support local passive database copies will be calculated. These numbers will be refined in a later step so that they match the server resiliency strategy and database copy layout. Calculate the requirements, using the following:
Tier 1 (512 MB mailbox quota, 100 messages per day message profile, 75 KB average message size)
Passive mailbox megacycles required = profile specific megacycles × number of mailbox users × number of passive copies
= 0.3 × 7650 × 2
= 4590
Tier 2 (1024 MB mailbox quota, 100 messages per day message profile, 75 KB average message size)
Passive mailbox megacycles required = profile specific megacycles × number of mailbox users × number of passive copies
= 0.3 × 900 × 2
= 540
Tier 3 (4096 MB mailbox quota, 150 messages per day message profile, 75 KB average message size)
Passive mailbox megacycles required = profile specific megacycles × number of mailbox users × number of passive copies
= 0.45 × 450 × 2
= 405
Total passive mailbox megacycles required (all tiers) = 5535
Step 4: Calculate total CPU requirements
Calculate the total requirements, using the following:
Total megacycles required = active mailbox + remote passive copies + local passive copies
= 18450 + 3690 + 5535
= 27676
Total megacycles per mailbox = 3.08
Return to top
Determine Whether Server Virtualization Will Be Used
Several factors are important when considering server virtualization for Exchange. For more information about supported configurations for virtualization, see Exchange 2010 System Requirements.
The main reasons customers use virtualization with Exchange are as follows:
If you expect server capacity to be underutilized and anticipate better utilization, you may purchase fewer servers as a result of virtualization.
You may want to use Windows Network Load Balancing when deploying Client Access, Hub Transport, and Mailbox server roles on the same physical server.
If your organization is using virtualization in all server infrastructure, you may want to use virtualization with Exchange, to be in alignment with corporate standard policy.
*Design Decision Point*
In this solution, deploying additional physical hardware for Client Access servers and Hub Transport servers isn't wanted. Active/passive site resiliency design would require several Mailbox servers to support the DAG design and database copy layout, which may result in unused capacity on the Mailbox servers. Virtualization will be used to better utilize capacity across server roles.
Return to top
Determine Whether Client Access and Hub Transport Server Roles Will Be Deployed in Separate Virtual Machines
When using virtualization for the Client Access and Hub Transport server roles, you may consider deploying both roles on the same VM. This approach reduces the number of VMs to manage, the number of server operating systems to update, and the number of Windows and Exchange licenses you need to purchase. Another benefit to combining the Client Access and Hub Transport server roles is to simplify the design process. When deploying roles in isolation, we recommend that you deploy one Hub Transport server logical processor for every four Mailbox server logical processors, and that you deploy three Client Access server logical processors for every four Mailbox server logical processors. This can be confusing, especially when you have to provide sufficient Client Access and Hub Transport servers during multiple VM or physical server failures or maintenance scenarios. When deploying Client Access, Hub Transport, and Mailbox servers on like physical servers or like VMs, you can deploy one server with the Client Access and Hub Transport server roles for every one Mailbox server in the site.
*Design Decision Point*
In this solution, co-locating the Hub Transport and Client Access server roles in the same VM is wanted. The Mailbox server role is deployed separately in a second VM. This will reduce the number of VMs and operating systems to manage as well as simplify planning for server resiliency.
Return to top
Determine Server Model for Hyper-V Root Server
Step 1: Identify preferred server vendor
In this solution, the preferred server vendor is Dell.
The Dell eleventh generation PowerEdge servers offer industry leading performance and efficiency. Innovations include increased memory capacity and faster I/O rates, which help deliver the performance required by today's most demanding applications.
Step 2: Review available options from preferred vendor
Dell's server portfolio includes several models that were considered for this implementation.
Option 1: Dell PowerEdge M610 Blade Server
The decision to use iSCSI attached storage provides the potential for taking advantage of Dell blades, based on the M1000e chassis. The M610 combines two sockets and twelve DIMMs in a half-height blade for a dense and power efficient server.
Dell PowerEdge M1000e blade chassis
Components | Description |
---|---|
Chassis\enclosure |
Form factor: 10U modular enclosure holds up to sixteen half-height blade servers 44.0 cm (17.3") height × 44.7 cm (17.6") width × 75.4 cm (29.7") depth Weight:
|
Power supplies |
3 (non-redundant) or 6 (redundant) 2,360 watt hot-plug power supplies |
Cooling fans |
M1000e chassis comes standard with 9 hot-pluggable, redundant fan modules |
Input device |
Front control panel with interactive graphical LCD:
Two USB keyboard/mouse connections and one video connection (requires the optional Avocent iKVM switch to enable these ports) for local front crash cart console connections that can be switched between blades |
Enclosure I/O modules |
Up to six total I/O modules for three fully redundant fabrics, featuring Ethernet FlexIO technology providing on-demand stacking and uplink scalability. Dell FlexIO technology delivers a level of I/O flexibility, bandwidth, investment protection, and capabilities unrivaled in the blade server market. FlexIO technologies include:
|
Management |
1 (standard) or optional second (redundant) Chassis Management Controller (CMC) Optional integrated Avocent keyboard, video and mouse (iKVM) switch Dell OpenManage systems management |
External storage options |
Dell EqualLogic PS series, Dell/EMC AX series, Dell/EMC CX series, Dell/EMC NS series, Dell PowerVault MD series, Dell PowerVault NX series |
Dell PowerEdge M610 server
Components | Description |
---|---|
Processors (x2) |
Latest quad-core or six-core Intel Xeon processors 5500 and 5600 series |
Form factor |
Blade/modular – half-height slot in an M1000e blade chassis |
Memory |
12 DIMM slots 1 GB/2 GB/4 GB/8 GB/16 GB ECC DDR3 Support for up to 192 GB using 12 × 16 GB DIMMs |
Drives |
Internal hot-swappable drives:
Solid-state drives (SSD):
|
I/O slots |
For details, see previous M1000e blade chassis information |
Option 2: Dell PowerEdge M710 blade server
The M710 provides two sockets in a blade form factor but extends the number of DIMMs to eighteen, greatly expanding memory capacity. However, the M710 also is a full height blade. The extra RAM can make the R710 an attractive virtualization server.
Dell PowerEdge T710 server
Components | Description |
---|---|
Processors (x2) |
Latest quad-core or six-core Intel Xeon processors 5500 and 5600 series |
Form factor |
Blade/modular—full-height slot in an M1000e blade chassis |
Memory |
18 DIMM slots 1 GB/2 GB/4 GB/8 GB/16 GB ECC DDR3 Support for up to 192 GB using 12 × 16 GB DIMMs |
Drives |
Internal hot-swappable drives:
SSD:
|
I/O slots |
For details, see previous M1000e blade chassis information |
Option 3: Dell PowerEdge R710 rack mounted server
Another choice for this implementation could be the Dell PowerEdge R710. This Intel-based platform is a 2u rack mounted server containing two sockets, eighteen DIMM slots, and the option of either eight 2.5", or six 3.5" internal hard disk drives. Although limited in internal disk capacity compared to the other server models presented, it scales beyond the R510 in memory (eighteen DIMMS compared to eight) and provides more I/O options. Storage capabilities may be expanded by using Dell PowerVault MD1200 or MD1220 direct attached storage arrays. The MD1200 provides twelve 3.5" hard disk drives in a 2u rack mounted form factor, while the MD1220 provides twenty-five 2.5" hard disk drives in the same 2u rack mounted form factor. These 6 Gbps SAS connected arrays can be daisy chained, up to four arrays per RAID controller, and also support redundant connections from the server. This storage option satisfies requirements for lower cost storage and simplicity while giving each node the ability to scale in the number of supported mailboxes.
Dell PowerEdge R710 server
Components | Description |
---|---|
Processors (x2) |
Latest quad-core or six-core Intel Xeon processors 5500 and 5600 series |
Form factor |
2U rack |
Memory |
Up to 192 GB (18 DIMM slots*): 1 GB/2 GB/4 GB/8 GB/16 GB DDR3, 800 megahertz (MHz), 1066 MHz, or 1333 MHz |
Drives |
Eight 2.5" hard disk drive option or six 3.5" hard disk drive option with optional flex bay expansion to support half-height TBU Up to six 3.5" drives with optional flex bay or up to eight 2.5" SAS or SATA drives with optional flex bay Peripheral bay options include slim optical drive bay with choice of DVD-ROM, combo CD-RW/DVD-ROM, or DVD + RW |
I/O slots |
2 PCIe x8 + 2 PCIe x4 G2 or 1 x16 + 2 x4 G2 |
Option 4: Dell PowerEdge R810 rack mounted server
The R810 is a two or four socket platform in a 2u form factor. It contains Dell patented FlexMem bridge technology, which allows the server to take advantage of all thirty-two DIMM slots even with only two processors installed. This enables the R810 to be a virtualization platform—providing great compute power in a dense package.
Dell PowerEdge R810 server
Components | Description |
---|---|
Processors (x4) |
Up to Eight-Core Intel Xeon 7500 and 6500 series processors |
Form factor |
2U rack |
Memory |
Up to 512 GB (32 DIMM slots) 1 GB/2 GB/4 GB/8 GB/16 GB DDR3 1066 MHz |
Drives |
Hot-swap option available with up to six 2.5" SAS or SATA drives, including SATA SSD |
I/O slots |
6 PCIe G2 slots:
|
Step 3: Select a server model
For this solution, Dell PowerEdge M610 blades is selected. To standardize on blades in the datacenter to take advantage of the density and power efficiencies is desired. Although the M710 may be able to support more VMs per server than the M610, there is still more capacity to be saved with the M610 in this deployment due to it being half-height versus the M710 full-height form factor.
In previous steps, megacycles required to support the number of active mailbox users were calculated. In the following steps, the number of available megacycles the selected server model and processor can support will be determined so that the number of active mailboxes each server can support can then be determined.
Step 4: Determine benchmark value for server and processor
Because the megacycle requirements are based on a baseline server and processor model, you need to adjust the available megacycles for the server against the baseline. To do this, independent performance benchmarks maintained by Standard Performance Evaluation Corporation (SPEC) are used. SPEC is a non-profit corporation formed to establish, maintain, and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers.
To help simplify the process of obtaining the benchmark value for your server and processor, we recommend you use the Exchange Processor Query tool. This tool automates the manual steps to determine your planned processor's SPECInt 2006 rate value. To run this tool, your computer must be connected to the Internet. The tool uses your planned processor model as input, and then runs a query against the Standard Performance Evaluation Corporation Web site returning all test result data for that specific processor model. The tool also calculates an average SPECint 2006 rate value based on the number of processors planned to be used in each Mailbox server Use the following calculations:
Processor and server platform = Intel X5550 2.6 gigahertz (GHz) in a Dell M610
SPECint_rate2006 value = 234
SPECint_rate2006 value per processor core = 234 ÷ 8
= 29.25
Step 5: Calculate adjusted megacycles
In previous steps, you calculated the required megacycles for the entire environment based on megacycle per mailbox estimates. Those estimates were measured on a baseline system (HP DL380 G5 x5470 3.33 GHz, 8 cores) that has a SPECint_rate2006 value of 150 (for an 8 core server), or 18.75 per core.
In this step, you need to adjust the available megacycles for the chosen server and processor against the baseline processor so that the required megacycles can be used for capacity planning.
To determine the megacycles of the Dell M610 Intel X5550 2.6 GHz platform, use the following formula:
Adjusted megacycles per core = (new platform per core value) × (hertz per core of baseline platform) ÷ (baseline per core value)
= (29.25 × 3330) ÷ 18.75
= 5195
Adjusted megacycles per server = adjusted megacycles per core × number of cores
= 5195 × 8
= 41558
Step 6: Adjust available megacycles for virtualization overhead
When deploying VMs on the root server, megacycles required to support the hypervisor and virtualization stack must be accounted for. This overhead varies from server to server and under different workloads. A conservative estimate of 10 percent of available megacycles will be used. Use the following calculation:
Adjusted available megacycles = usable megacycles × 0.90
= 41558 × 0.90
= 37403
So each server has a usable capacity for VMs of 37403 megacycles.
The usable capacity per logical processor is 4675 megacycles.
Return to top
Determine the CPU Capacity of the Virtual Machines
Now that we know the megacycles of the root server we can calculate the megacycles of each VM. These values will be used to determine how many VMs are required and how many mailboxes will be hosted by each VM.
Step 1: Calculate available megacycles per virtual machine
In this step, you determine how many megacycles are available for each VM deployed on the root server. Because the server has eight logical processors, plan to deploy two VMs per server, each with four virtual processors. Use the following calculation:
Available megacycles per VM = adjusted available megacycles per server ÷ number of VMs
= 37403 ÷ 2
= 18701
Step 2: Determine the target available megacycles per virtual machine
Because the design assumptions state not to exceed 80 percent processor utilization, in this step, you adjust the available megacycles to reflect the 80 percent target. Use the following calculation:
Because the design assumptions state not to exceed 70 percent processor utilization, in this step, you adjust the available megacycles to reflect the 70 percent target. Use the following calculation:
Target available megacycles = available megacycles × target max processor utilization
= 18701 × 0.70
= 13091
Return to top
Determine Number of Mailbox Server Virtual Machines Required
You can use the following steps to determine the number of Mailbox server VMs required.
Step 1: Determine the maximum number of mailboxes supported by the MBX virtual machine
To determine the maximum number of mailboxes supported by the MBX VM, use the following calculation:
Number of active mailboxes = available megacycles ÷ megacycles per mailbox
= 13091 ÷ 3.08
= 4250
Step 2: Determine the minimum number of mailbox virtual machines required in the primary site
To determine the minimum number of mailbox VMs required in the primary site, use the following calculation:
Number of VMs required = total mailbox count in site ÷ active mailboxes per VM
= 9000 ÷ 4250
= 2.2
Based on processor capacity, minimum of three Mailbox server VMs to support the anticipated peak work load during normal operating conditions is required.
Step 3: Determine number of Mailbox server virtual machines required to support the mailbox resiliency strategy
In the previous step, you determined that a minimum of three Mailbox server VMs to support the target workload are needed. In an active/passive database distribution model, you need a minimum of three Mailbox server VMs in the secondary datacenter to support the workload during a site failure event. The DAG design will have nine Mailbox server VMs with six in the primary site and three in the secondary site.
Datacenter vs. Mailbox server count
Primary datacenter | Secondary datacenter | Total Mailbox server count |
---|---|---|
2 |
1 |
3 |
4 |
2 |
6 |
6 |
3 |
9 |
8 |
4 |
12 |
Return to top
Determine Number of Mailboxes per Mailbox Server
You can use the following steps to determine the number of mailboxes per Mailbox server.
Step 1: Determine number of active mailboxes per server during normal operation
To determine the number of active mailboxes per server during normal operation, use the following calculation:
Number of active mailboxes per server = total mailbox count ÷ server count
= 9000 ÷ 6
= 1500
Step 2: Determine number of active mailboxes per server worst case failure event
To determine the number of active mailboxes per server worst case failure event, use the following calculation:
Number of active mailboxes per server = total mailbox count ÷ server count
= 9000 ÷ 3
= 3000
Return to top
Determine Memory Required Per Mailbox Server
You can use the following steps to determine the memory required per Mailbox server.
Step 1: Determine database cache requirements per server for the worst case failure scenario
In a previous step, you determined that the database cache requirements for all mailboxes was 55 GB and the average cache required per active mailbox was 6.2 MB.
To design for the worst case failure scenario, you calculate based on active mailboxes residing on three of six Mailbox servers. Use the following calculation:
Memory required for database cache = number of active mailboxes × average cache per mailbox
= 3000 × 6.2 MB
= 18600 MB
= 18.2 GB
Step 2: Determine total memory requirements per mailbox virtual machine server for the worst case failure scenario
In this step, reference the following table to determine the recommended memory configuration.
Memory requirements
Server physical memory (RAM) | Database cache size (Mailbox role only) |
---|---|
24 GB |
17.6 GB |
32 GB |
24.4 GB |
48 GB |
39.2 GB |
The recommended memory configuration to support 18.2 GB of database cache for a mailbox role server is 32 GB.
Return to top
Determine Number of Client Access and Hub Transport Server Combo Virtual Machines Required
In a previous step, it was determined that nine Mailbox server VMs are required. We recommend that you deploy one Client Access and Hub Transport server combo VM for every MBX VM. Therefore, the design will have nine Client Access and Hub Transport server combo VMs.
Number of Client Access and Hub Transport server combo VMs required
Server role configuration | Recommended processor core ratio |
---|---|
Mailbox server role: Client Access and Hub Transport combined server role |
1:1 |
Determine Memory Required per Combined Client Access and Hub Transport Virtual Machines
To determine the memory configuration for the combined Client Access and Hub Transport server role VM, reference the following table.
Memory configurations for Exchange 2010 servers based on installed server roles
Exchange 2010 server role | Minimum supported | Recommended maximum |
---|---|---|
Hub Transport server role |
4 GB |
1 GB per core |
Client Access server role |
4 GB |
2 GB per core |
Client Access and Hub Transport combined server role (Client Access and Hub Transport server roles running on the same physical server) |
4 GB |
2 GB per core |
Based on the preceding table, each combination Client Access and Hub Transport server VM requires a minimum of 8 GB of memory.
Return to top
Determine Virtual Machine Distribution
When deciding which VMs to host on which root server, your main goal should be to eliminate single points of failure. Don't locate both Client Access and Hub Transport server role VMs on the same root server, and don't locate both Mailbox server role VMs on the same root server.
Virtual machine distribution (incorrect)
The correct distribution is one Client Access and Hub Transport server role VM on each of the physical host servers and one Mailbox server role VM on each of the physical host servers. So in this solution there will be nine Hyper-V root servers each supporting one Client Access and Hub Transport server role VM and one Mailbox server role VM.
Virtual machine distribution (correct)
Return to top
Determine Memory Required per Root Server
To determine the memory required for each root server, use the following calculation:
Root server memory = Client Access and Hub Transport server role VM memory + Mailbox server role VM memory
= 8 GB + 32 GB
= 40 GB
The Hyper-V root server will require a minimum of 40 GB.
Return to top
Determine Minimum Number of Databases Required
To determine the optimal number of Exchange databases to deploy, use the Exchange 2010 Mailbox Role Calculator. Enter the appropriate information on the input tab and select Yes for Automatically Calculate Number of Unique Databases / DAG.
Database configuration
On the Role Requirements tab, the recommended number of databases appears.
Recommended number of databases
In this solution, a minimum of 12 databases will be used. The exact number of databases may be adjusted in future steps to accommodate the database copy layout.
Return to top
Identify Failure Domains Impacting Database Copy Layout
Use the following steps to identify failure domains impacting database copy layout.
Step 1: Identify failure domains associated with storage
In a previous step, it was decided to deploy three Dell EqualLogic PS6500E arrays and to deploy three copies of each database. To provide maximum protection for each of those database copies, we recommend that no more than one copy of a single database be located on the same physical array. In this scenario, each PS6500E represents a failure domain that will impact the layout of database copies in the DAG.
Dell EqualLogic PS6500E arrays
Step 2: Identify failure domains associated with servers
In a previous step, it was determined that nine physical blade servers will be deployed. Six of those servers will be deployed in the primary datacenter and three in the secondary datacenter. Blades are associated with blade enclosures. So to support the site resiliency requirements, a minimum of 2 blade enclosures are required.
Failure domains associated with servers
In the previous step, it was determined that PS6500E represents three failure domains. Consider when all six blades in the first enclosure to the two PS6500Es in the primary datacenter are connected. In the event that there is an issue impacting the enclosure, there are no other servers in the primary datacenter and you're forced to conduct a manual site switchover to the secondary datacenter. A better design is to deploy three blade enclosures, each with three of the nine server blades. Pair the servers in the first enclosure with the first PS6500E, the servers in the second enclosure with the second PS6500E, and the three servers in the secondary site with the PS6500E in the secondary site. By aligning the server and storage failure domains, the database copies are set in a manner that protects against issues with either the storage array or an entire blade enclosure.
Failure domains associated with servers in two sites
Return to top
Design Database Copy Layout
Use the following steps to design database copy layout.
Step 1: Determine number of database copies per Mailbox server
In a previous step, it was determined that the minimum number of unique databases that should be deployed is 12. In an active/passive configuration with three copies, we recommend that the number of databases equal the total number of Mailbox servers in the primary site multiplied by the number of Mailbox servers in a single failure domain and be greater than the minimum number of recommended databases. Use the following calculation:
Unique database count = total number of Mailbox servers in primary datacenter × number of Mailbox servers in failure domain
= 6 × 3
=18
Step 2: Determine database layout during normal operating conditions
Consider equally distributing the C1 database copies (or the copies with an activation preference value of 1) to the servers in the primary datacenter. These are the copies that will be active during normal operating conditions.
Database copy layout during normal operating conditions
DB | MBX1 | MBX2 | MBX3 | MBX4 | MBX5 | MBX6 |
---|---|---|---|---|---|---|
DB1 |
C1 |
|
|
|
|
|
DB2 |
C1 |
|
|
|
|
|
DB3 |
C1 |
|
|
|
|
|
DB4 |
|
C1 |
|
|
|
|
DB5 |
|
C1 |
|
|
|
|
DB6 |
|
C1 |
|
|
|
|
DB7 |
|
|
C1 |
|
|
|
DB8 |
|
|
C1 |
|
|
|
DB9 |
|
|
C1 |
|
|
|
DB10 |
|
|
|
C1 |
|
|
DB11 |
|
|
|
C1 |
|
|
DB12 |
|
|
|
C1 |
|
|
DB13 |
|
|
|
|
C1 |
|
DB14 |
|
|
|
|
C1 |
|
DB15 |
|
|
|
|
C1 |
|
DB16 |
|
|
|
|
|
C1 |
DB17 |
|
|
|
|
|
C1 |
DB18 |
|
|
|
|
|
C1 |
In the preceding table, the following applies:
- C1 = active copy (activation preference value of 1) during normal operations
Next distribute the C2 database copies (or the copies with an activation preference value of 2) to the servers in the second failure domain. During the distribution, you distribute the C2 copies across as many servers in the alternate failure domain as possible to ensure that a single server failure has a minimal impact on the servers in the alternate failure domain.
Database copy layout with C2 database copies distributed
DB | MBX1 | MBX2 | MBX3 | MBX4 | MBX5 | MBX6 |
---|---|---|---|---|---|---|
DB1 |
C1 |
|
|
C2 |
|
|
DB2 |
C1 |
|
|
|
C2 |
|
DB3 |
C1 |
|
|
|
|
C2 |
DB4 |
|
C1 |
|
C2 |
|
|
DB5 |
|
C1 |
|
|
C2 |
|
DB6 |
|
C1 |
|
|
|
C2 |
DB7 |
|
|
C1 |
C2 |
|
|
DB8 |
|
|
C1 |
|
C2 |
|
DB9 |
|
|
C1 |
|
|
C2 |
In the preceding table, the following applies:
C1 = active copy (activation preference value of 1) during normal operations
C2 = passive copy (activation preference value of 2) during normal operations
Consider the opposite configuration for the other failure domain. Again, you distribute the C2 copies across as many servers in the alternate failure domain as possible to ensure that a single server failure has a minimal impact on the servers in the alternate failure domain.
Database copy layout with C2 database copies distributed in the opposite configuration
DB | MBX1 | MBX2 | MBX3 | MBX4 | MBX5 | MBX6 |
---|---|---|---|---|---|---|
DB10 |
C2 |
|
|
C1 |
|
|
DB11 |
|
C2 |
|
C1 |
|
|
DB12 |
|
|
C2 |
C1 |
|
|
DB13 |
C2 |
|
|
|
C1 |
|
DB14 |
|
C2 |
|
|
C1 |
|
DB15 |
|
|
C2 |
|
C1 |
|
DB16 |
C2 |
|
|
|
|
C1 |
DB17 |
|
C2 |
|
|
|
C1 |
DB18 |
|
|
C2 |
|
|
C1 |
In the preceding table, the following applies:
C1 = active copy (activation preference value of 1) during normal operations
C2 = passive copy (activation preference value of 2) during normal operations
Step 3: Determine database layout during server failure and maintenance conditions
Before the secondary datacenter and distribute the C3 copies are considered, examine the following server failure scenario. In the following example, if server MBX1 fails, the active database copies will automatically move to servers MBX4, MBX5, and MBX6. Notice that each of the three servers in the alternate failure domain are now running with four active databases and the active databases are equally distributed across all three servers.
Database copy layout during server maintenance or failure
In the preceding table, the following applies:
C1 = active copy (activation preference value of 1) during normal operations
C2 = passive copy (activation preference value of 2) during normal operations
In a maintenance scenario, you could move the active mailbox databases from the servers in the first failure domain (MBX1, MBX2, MBX3) to the servers in the second failure domain (MBX4, MBX5, MBX6), complete maintenance activities, and then move the active database copies back to the C1 copies on the servers in the first failure domain. You can conduct maintenance activities on all servers in the primary datacenter in two passes.
Database copy layout during server maintenance
In the preceding table, the following applies:
C1 = active copy (activation preference value of 1) during normal operations
C2 = passive copy (activation preference value of 2) during normal operations
Step 4: Add database copies to secondary datacenter to support site resiliency
The last step in the database copy layout is to add the C3 copies (or copies with an activation preference value of 3) to the servers in the secondary datacenter to provide site resiliency. As performed with the C2 copies, distribute the C3 copies across as many servers in the alternate failure domain as possible to ensure that any issues impacting multiple Mailbox servers in the primary datacenter has a minimal impact on the servers in the alternate failure domain in the secondary datacenter. In a full site failure scenario, activate all C3 copies in the secondary datacenter using Datacenter Activation Coordination (DAC) and the distribution of database copies in relation to servers in the primary datacenter is less important.
Database copy layout to support site resiliency
DB | MBX1 | MBX2 | MBX3 | MBX4 | MBX5 | MBX6 | MBX7 | MBX8 | MBX9 |
---|---|---|---|---|---|---|---|---|---|
DB1 |
C1 |
C2 |
C3 |
||||||
DB2 |
C1 |
C2 |
C3 |
||||||
DB3 |
C1 |
C2 |
C3 |
||||||
DB4 |
C1 |
C2 |
C3 |
||||||
DB5 |
C1 |
C2 |
C3 |
||||||
DB6 |
C1 |
C2 |
C3 |
||||||
DB7 |
C1 |
C2 |
C3 |
||||||
DB8 |
C1 |
C2 |
C3 |
||||||
DB9 |
C1 |
C2 |
C3 |
||||||
DB10 |
C2 |
C1 |
C3 |
||||||
DB11 |
C2 |
C1 |
C3 |
||||||
DB12 |
C2 |
C1 |
C3 |
||||||
DB13 |
C2 |
C1 |
C3 |
||||||
DB14 |
C2 |
C1 |
C3 |
||||||
DB15 |
C2 |
C1 |
C3 |
||||||
DB16 |
C2 |
C1 |
C3 |
||||||
DB17 |
C2 |
C1 |
C3 |
||||||
DB18 |
C2 |
C1 |
C3 |
In the preceding table, the following applies:
C1 = active copy (activation preference value of 1) during normal operations
C2 = passive copy (activation preference value of 2) during normal operations
C3 = remote passive copy (activation preference value of 3) during normal operations
Return to top
Determine Storage Design
A well designed storage solution is a critical aspect of a successful Exchange 2010 Mailbox server role deployment. For more information, see Mailbox Server Storage Design.
Step 1: Summarize storage requirements
The following table summarizes the storage requirements that have been calculated or determined in a previous design step.
Summary of disk space requirements
Disk space requirements | Value |
---|---|
Average mailbox size on disk (MB) |
907 |
Database space required (GB) |
13147 |
Log space required (GB) |
757 |
Total space required (GB) |
13904 |
Total space required for three database copies (GB) |
41712 |
Total space required for three database copies (terabytes) |
41 |
Step 2: Determine whether logs and databases will be co-located on the same LUN
In previous Exchange releases, it was a recommended best practice to separate database files and log files from the same mailbox database to different volumes backed by different physical disks for recoverability purposes. This is still a recommended best practice for stand-alone architectures and architectures using VSS-based backups. If you're using Exchange native data protection and have deployed a minimum of three database copies, isolation of logs and databases isn't necessary.
*Design Decision Point*
With the EqualLogic array, the RAID-10 set spans across all 46 disks. Because this architecture doesn't offer spindle isolation, there is no reason to create separate LUNs for database and log files, therefore subsequent design decisions will be based on a single LUN for each database and log set.
Step 3: Determine number of LUNs required per array
In a previous step, it was identified that each primary Mailbox server would support three active databases, three passive database copies, and three lagged database copies. Therefore there will be a total of nine LUNs for each primary datacenter Mailbox server.
Number of LUNs required per array
Databases | LUNs per server | LUNs per array |
---|---|---|
Active databases |
3 |
9 |
Passive databases |
3 |
9 |
Lagged databases |
3 |
9 |
Total LUNs |
9 |
27 |
Step 4: Determine required LUN size
This step determines the size of the LUN required to support both the database and log capacity requirements. Use the following calculations:
Database capacity = [(number of mailbox users × average mailbox size on disk) + (20% data overhead factor)] + (10% content indexing overhead)
= [(500 × 907) + (90700)] + 54420
= 598620 MB
= 585 GB
Log capacity = (log size × number of logs per mailbox per day × number of days required to replace hardware × number of mailbox users) + (mailbox move percent overhead)
= (1 MB × 20.5 × 3 × 500) + (500 × 0.01 × 907 MB)
=35285 MB
=35 GB
LUN size = [(database capacity) + (log capacity)] +20% volume free space
= [(585) + (35)] ÷ .8
= 775 GB
The required LUN size is 775 GB.
Step 5: Calculate actual LUN size
In a previous step, it was determined that the EqualLogic PS6500E has a usable capacity of 20.8 terabytes or 21299 GB when using RAID-10 and having two spares configured. Each array needs to have 27 LUNs.
- 21299 ÷ 27 = 789 GB
The actual LUN size will be 789 GB, which will support the required LUN size of 775 GB.
Actual LUN size
Description | Value |
---|---|
Usable capacity |
21299 GB |
Number of LUNs required |
27 |
Required LUN Size |
775 GB |
Actual LUN Size |
789 GB |
Step 6: Determine volume layout on PS6500Es
The following table illustrates how the database copies are positioned on the XIV Storage Systems.
Volume layout on PS6500Es
Database | Array1 | Database | Array2 | Database | Array3 | ||
---|---|---|---|---|---|---|---|
DB1 |
C1 |
DB1 |
C2 |
DB1 |
C3 |
||
DB2 |
C1 |
DB2 |
C2 |
DB2 |
C3 |
||
DB3 |
C1 |
DB3 |
C2 |
DB3 |
C3 |
||
DB4 |
C1 |
DB4 |
C2 |
DB4 |
C3 |
||
DB5 |
C1 |
DB5 |
C2 |
DB5 |
C3 |
||
DB6 |
C1 |
DB6 |
C2 |
DB6 |
C3 |
||
DB7 |
C1 |
DB7 |
C2 |
DB7 |
C3 |
||
DB8 |
C1 |
DB8 |
C2 |
DB8 |
C3 |
||
DB9 |
C1 |
DB9 |
C2 |
DB9 |
C3 |
||
DB10 |
C2 |
DB10 |
C1 |
DB10 |
C3 |
||
DB11 |
C2 |
DB11 |
C1 |
DB11 |
C3 |
||
DB12 |
C2 |
DB12 |
C1 |
DB12 |
C3 |
||
DB13 |
C2 |
DB13 |
C1 |
DB13 |
C3 |
||
DB14 |
C2 |
DB14 |
C1 |
DB14 |
C3 |
||
DB15 |
C2 |
DB15 |
C1 |
DB15 |
C3 |
||
DB16 |
C2 |
DB16 |
C1 |
DB16 |
C3 |
||
DB17 |
C2 |
DB17 |
C1 |
DB17 |
C3 |
||
DB18 |
C2 |
DB18 |
C1 |
DB18 |
C3 |
Return to top
Determine Placement of the File Share Witness
In Exchange 2010, the DAG uses a minimal set of components from Windows failover clustering. One of those components is the quorum resource, which provides a means for arbitration when determining cluster state and making membership decisions. It's critical that each DAG member have a consistent view of how the DAGs underlying cluster is configured. The quorum acts as the definitive repository for all configuration information relating to the cluster. The quorum is also used as a tiebreaker to avoid split brain syndrome. Split brain syndrome is a condition that occurs when DAG members can't communicate with each other but are available and running. Split brain syndrome is prevented by always requiring a majority of the DAG members (and in the case of DAGs with an even number of members, the DAG witness server) to be available and interacting for the DAG to be operational.
A witness server is a server outside of a DAG that hosts the file share witness, which is used to achieve and maintain quorum when the DAG has an even number of members. DAGs with an odd number of members don't use a witness server. Upon creation of a DAG, the file share witness is added by default to a Hub Transport server (that doesn't have the Mailbox server role installed) in the same site as the first member of the DAG. If your Hub Transport server is running in a VM that resides on the same root server as VMs running the Mailbox server role, we recommend that you move the location of the file share witness to another highly available server. You can move the file share witness to a domain controller, but because of security implications, do this only as a last resort.
*Design Decision Point*
The file and print server is reasonably stable and is managed by the same administrator who supports the Exchange servers, so it's a good choice for the location of the file share witness.
Return to top
Plan Namespaces
When you plan your Exchange 2010 organization, one of the most important decisions that you must make is how to arrange your organization's external namespace. A namespace is a logical structure usually represented by a domain name in Domain Name System (DNS). When you define your namespace, you must consider the different locations of your clients and the servers that house their mailboxes. In addition to the physical locations of clients, you must evaluate how they connect to Exchange 2010. The answers to these questions will determine how many namespaces you must have. Your namespaces will typically align with your DNS configuration. We recommend that each Active Directory site in a region that has one or more Internet-facing Client Access servers have a unique namespace. This is usually represented in DNS by an A record, for example, mail.contoso.com or mail.europe.contoso.com.
For more information, see Understanding Client Access Server Namespaces.
There are a number of different ways to arrange your external namespaces, but usually your requirements can be met with one of the following namespace models:
Consolidated datacenter model This model consists of a single physical site. All servers are located within the site, and there is a single namespace, for example, mail.contoso.com.
Single namespace with proxy sites This model consists of multiple physical sites. Only one site contains an Internet-facing Client Access server. The other sites aren't exposed to the Internet. There is only one namespace for the sites in this model, for example, mail.contoso.com.
Single namespace and multiple sites This model consists of multiple physical sites. Each site can have an Internet-facing Client Access server. Alternatively, there may be only a single site that contains Internet-facing Client Access servers. There is only one namespace for the sites in this model, for example, mail.contoso.com.
Regional namespaces This model consists of multiple physical sites and multiple namespaces. For example, a site located in New York City would have the namespace mail.usa.contoso.com, a site located in Toronto would have the namespace mail.canada.contoso.com, and a site located in London would have the namespace mail.europe.contoso.com.
Multiple forests This model consists of multiple forests that have multiple namespaces. An organization that uses this model could be made up of two partner companies, for example, Contoso and Fabrikam. Namespaces might include mail.usa.contoso.com, mail.europe.contoso.com, mail.asia.fabrikam.com, and mail.europe.fabrikam.com.
*Design Decision Point*
Because this solution is deploying an active/passive site resiliency model and doesn't have any active mailbox users in the secondary site, the best option is the single namespace with multiple sites model.
Return to top
Determine Client Access Server Array and Load Balancing Strategy
In Exchange 2010, the RPC Client Access service and the Exchange Address Book service were introduced on the Client Access server role to improve the mailbox users experience when the active mailbox database copy is moved to another Mailbox server (for example, during mailbox database failures and maintenance events). The connection endpoints for mailbox access from Microsoft Outlook and other MAPI clients have been moved from the Mailbox server role to the Client Access server role. Therefore, both internal and external Outlook connections must now be load balanced across all Client Access servers in the site to achieve fault tolerance. To associate the MAPI endpoint with a group of Client Access servers rather than a specific Client Access server, you can define a Client Access server array. You can only configure one array per Active Directory site, and an array can't span more than one Active Directory site. For more information, see Understanding RPC Client Access and Understanding Load Balancing in Exchange 2010.
*Design Decision Point*
In a previous step, it was determined that Client Access servers would be deployed in two physical locations in two Active Directory sites. Therefore, you need to deploy two Client Access server arrays. A single namespace will be load balanced across the Client Access servers in the primary active Client Access server array using redundant hardware load balancers. In a site failure, the namespace will be load balanced across the Client Access servers in the secondary Client Access server array.
Return to top
Determine Hardware Load Balancing Solution
Use the following steps to determine a hardware load balancing solution.
Step 1: Identify preferred server vendor
The preferred vendor for application load balancing is F5.
The F5 comprehensive Application Ready infrastructure for Exchange Server allows organizations to easily provide additional performance, security, and availability to ensure maximum return on investment for Exchange deployments.
Step 2: Review available options from preferred vendor
F5 offers a suite of appliance-based networking technologies designed to optimize networks for applications such as Exchange 2010:
BIG-IP Local Traffic Manager (LTM) BIG-IP LTM is designed to monitor and manage traffic to Client Access, Hub Transport, Edge Transport, and Unified Messaging servers, while ensuring that users are always sent to the best performing resource. Whether your users are connecting via MAPI, Outlook Web App, ActiveSync, or Outlook Anywhere, BIG-IP LTM will load balance the connections appropriately, allowing you to seamlessly scale to any size deployment. BIG-IP LTM now offers several modules that also provide significant value in an Exchange environment, which include:
Access Policy Manager (APM) Designed to secure access to Exchange resources, APM can authenticate users before they attach to your Exchange Client Access servers, providing a strong perimeter security.
Big-IP WebAccelerator Targeting customers with large Outlook Web App constituencies, WebAccelerator can drive down bandwidth usage and server utilization while accelerating content to end users.
WAN Optimization Module (WOM) Focused on network optimization for WANs, WOM has proven capable in accelerating DAG replication by over five times between datacenters.
BIG-IP Global Traffic Manager (GTM) BIG-IP GTM can provide wide area resiliency, providing disaster recovery and load balancing for those with multiple datacenter Exchange deployments.
BIG-IP Application Security Manager (ASM) A fully featured Layer 7 firewall, ASM thwarts HTTP, XML, and SMTP based attacks. By combining a negative and positive security model, ASM provides protection against all L7 attacks, both known and unknown.
For more information about these technologies, see F5 Solutions for Exchange Server.
Sizing the appropriate F5 hardware model for your Exchange 2010 deployment is an exercise best done with the guidance of your local F5 team. F5 offers production hardware-based and software-based BIG-IP platforms that range from supporting up to 200 megabits per second (Mbps) all the way up to 80 Gbps. To learn more about the specifications for each of the F5 BIG-IP LTM hardware platforms, see BIG-IP System Hardware Datasheet.
Option 1: BIG-IP 1600 series
The BIG-IP 1600 offers all the functionality of TMOS in a cost-effective, entry-level platform for intelligent application delivery.
BIG-IP 1600 appliance-based networking technologies
Components | Value or description |
---|---|
Traffic throughput |
1 Gbps |
Hardware Secure Sockets Layer (SSL) |
Included: 500 transactions per second Maximum: 5,000 transactions per second 1 Gbps bulk encryption |
Software compression |
Included: 50 Mbps Maximum: 1 Gbps |
Processor |
Dual core CPU |
Memory |
4 GB |
Gigabit Ethernet CU ports |
4 |
Gigabit fiber ports (small form-factor pluggable transceiver) |
2 optional LX, SX, or copper |
Power supply |
One 300 watt included with a dual power option |
Typical consumption |
150 watt (110 volt input) |
Option 2: BIG-IP 3900 series
With a quad-core processor that enables support for multiple BIG-IP modules, the BIG-IP 3900 unifies application delivery in a 1U, cost-effective platform.
BIG-IP 3900 appliance-based networking technologies
Components | Value or description |
---|---|
Traffic throughput |
4 Gbps |
Hardware SSL |
Included: 500 transactions per second Maximum: 15,000 transactions per second 2.4 Gbps bulk encryption |
Software compression |
Included: 50 Mbps Maximum: 3.8 Gbps |
Processor |
Dual core CPU |
Memory |
8 GB |
Gigabit Ethernet CU ports |
8 |
Gigabit fiber ports (small form-factor pluggable transceiver) |
4 optional LX, SX, or copper |
Power supply |
One 300 watt included with a dual power option |
Typical consumption |
175 watt (110 volt input) |
Option 3: BIG-IP 6900 series
With two dual-core processors as well as hardware, SSL, and compression, the BIG-IP 6900 has the performance to provide an integrated platform for application delivery. The BIG-IP 6900 can process up to 6 Gbps of throughput to handle the most demanding applications.
BIG-IP 6900 appliance-based networking technologies
Components | Value or description |
---|---|
Traffic throughput |
6 Gbps |
Hardware SSL |
Included: 500 transactions per second Maximum: 25,000 transactions per second 4 Gbps bulk encryption |
FIPS SSL |
FIPS 140-2 Level 2 (option) 20,000 transactions per second |
Software compression |
Included: 50 Mbps Maximum: 5 Gbps |
Processor |
Dual core CPU (4 processors) |
Memory |
8 GB |
Gigabit Ethernet CU ports |
16 |
Gigabit fiber ports (small form-factor pluggable transceiver) |
8 optional LX, SX, or copper |
Power supply |
Dual 850 watt included |
Typical consumption |
300 watt (110 volt input) |
Step 3: Select a hardware load balancing solution model
When it comes time to determine which application delivery controller is suitable, consider the following:
Purpose of the application delivery controllers:
Simple load balancing
Security
Acceleration
How users are connecting:
IMAP
Outlook Web App
Outlook Anywhere
Hardware benefits of an appliance-based BIG-IP vs. flexibility of a software-based BIG-IP
Desired scale and number of concurrent users
Percentage of local users vs. remote users
Average user expectations, such as number of messages per day average e-mail message size
This information can be used to ensure the right BIG-IP LTM platform is selected.
*Design Decision Point*
The BIG-IP 3900 is selected for this solution. The 3900 4 GB capacity and connection count limits are enough to cover normal usage as well as unexpected traffic spikes for 15,000 active mailboxes with a 50 message per day profile. The quad core CPU is also capable enough to handle the processing associated with connection and persistence handling.
Return to top
Determine Hardware Load Balancing Device Resiliency Strategy
Whenever deploying BIG-IP LTM, it's important that all efforts are made to implement with fault tolerance in mind. BIG-IP LTM is designed to ensure that application server outages never affect end users, and that the technology helps ensure BIG-IP LTM failures are recovered from in a controlled and seamless manner.
Customers typically deploy BIG-IP LTM in redundant pairs. Connected by a dedicated network and serial channel, the two BIG-IP LTMs coordinate network responsibilities, ensuring failure of one device is automatically detected and recovered from by its peer. BIG-IP LTM excels in this area by offering unique functionality such as:
Connection mirroring This ensures the connection table in each BIG-IP LTM is mirrored to its peer. This means that in case of a BIG-IP LTM failure, no connections are dropped because the BIG-IP LTM failover partner is already aware of the previously established connections and it assumes responsibilities for the network.
Network-based outage detection This ensures that a network outage is just as critical as a server outage for the BIG-IP LTM, and that proper remediation steps need to be taken to attempt to remedy the situation.
Software-based and hardware-based watchdog functionality This ensures proper failover when a BIG-IP LTM isn't functioning properly.
Besides deploying BIG-IP LTMs in redundant pairs, customers often build redundancy into the architecture by building a multiple datacenter environment. BIG-IP GTM is designed to add datacenter load balancing so that wide area resiliency is also achieved. For more information about GTM, see Global Load Balancing Solutions.
Return to top
Determine Hardware Load Balancing Methods
Exchange protocols and client access services have different load balancing requirements. Some Exchange protocols and client access services require client to Client Access server affinity. Others work without it, but display performance improvements from such affinity. Other Exchange protocols don't require client to Client Access server affinity, and performance doesn't decrease without affinity. For additional information, see Load Balancing Requirements of Exchange Protocols and Understanding Load Balancing in Exchange 2010.
For more information about configuring F5 BIG-IP LTMs, see Deploying F5 with Microsoft Exchange Server 2010.
Return to top
Solution Overview
The previous section provided information about the design decisions that were made when considering an Exchange 2010 solution. The following section provides an overview of the solution.
Return to top
Logical Solution Diagram
This solution consists of a total of 18 Exchange 2010 servers deployed in a multisite topology. Nine of the 18 servers are running both the Client Access and Hub Transport server roles. The other nine servers are running the Mailbox server role. The primary namespace is load balanced across six Client Access and Hub Transport servers in a Client Access server array in the primary site. There are three Client Access and Hub Transport servers in a second Client Access server array located in the secondary site. All nine Mailbox servers are members of a single DAG. There are six Mailbox servers located in the primary site and three Mailbox servers in the secondary site. The site resiliency model is active/passive.
Logical solution
Return to top
Physical Solution Diagram
This solution consists of nine Dell PowerEdge M610 blade servers in three PowerEdge M1000e modular blade enclosures attached to three EqualLogic PS6500E iSCSI storage arrays via four redundant modular PowerConnect M6220 switches. The hardware in this solution has been provisioned such that there are three failure domains. A failure domain represents a single point of failure and is used to ensure that database copy layouts in the DAG protect against loss of any component in a failure domain. Each failure domain consists of one blade enclosure holding three blade servers and two modular switches connected to a single PS6500E storage array.
Physical solution
Return to top
Server Hardware Summary
The following table summarizes the physical server hardware used in this solution.
Server hardware summary
Component | Description |
---|---|
Server vendor |
Dell |
Server model |
PowerEdge M610 blade server |
Processor |
2 x Intel Xeon CPU X5550 2.66 GHz |
Chipset |
Intel 5520/5500/X58 |
Memory |
48 GB |
Operating system |
Windows Server 2008 R2 |
Virtualization |
Microsoft Hyper-V |
Internal disk |
2 x 300 GB SAS 15k |
Operating system disk configuration |
RAID-1 |
RAID controller |
Dell SAS 6/iR integrated blades controller |
Network interface |
Broadcom NetXtreme II C-NIC GigE |
For more information, see PowerEdge M610 Blade Server.
Return to top
Client Access and Hub Transport Server Configuration
The following table summarizes the Client Access and Hub Transport server configuration used in this solution.
Client Access and Hub Transport server configuration
Component | Description |
---|---|
Physical or virtual |
Hyper-V VM |
Virtual processors |
4 |
Memory |
8 GB |
Storage |
Virtual hard disk on root server operating system volume |
Operating system |
Windows Server 2008 R2 |
Exchange version |
Exchange Server 2010 Standard Edition |
Exchange patch level |
Exchange 2010 Update Rollup 3 |
Return to top
Mailbox Server Configuration
The following table summarizes the Mailbox server configuration used in this solution.
Mailbox server configuration
Component | Description |
---|---|
Physical or virtual |
Hyper-V VM |
Virtual processors |
4 |
Memory |
32 GB |
Storage |
Virtual hard disk on root server operating system volume |
Pass-through storage |
9 volumes × 789 GB |
Operating system |
Windows Server 2008 R2 |
Exchange version |
Exchange Server 2010 Enterprise Edition |
Exchange patch level |
Exchange 2010 Update Rollup 2 |
Third-party software |
None |
Return to top
Database Layout
The following diagram illustrates the database layout across the primary and secondary datacenters.
Database layout
Return to top
Storage Hardware Summary
The following table summarizes the storage hardware used in this solution.
Storage hardware summary
Component | Description |
---|---|
Storage vendor |
Dell |
Storage model |
EqualLogic PS6500E |
Category |
iSCSI |
Disks |
48 × 1 terabyte 7200 rpm SATA |
Active disks |
46 |
Spares |
2 |
RAID level |
10 |
Usable capacity |
20.8 terabytes |
For more information, see Dell EqualLogic PS6500E iSCSI SAN.
Return to top
Storage Configuration
Each of the Dell EqualLogic PS6500E storage arrays used in the solution were configured as illustrated in the following table.
Storage configuration
Component | Description |
---|---|
Storage enclosures |
3 |
LUNs per enclosure |
27 |
LUNs per server |
9 |
LUN size |
798 GB |
RAID level |
RAID-10 |
The following table illustrates how the available storage was designed and allocated between the three PS6500E storage arrays.
PS6500 storage array design and allocation
Database | Array1 | Database | Array2 | Database | Array3 | ||
---|---|---|---|---|---|---|---|
DB1 |
C1 |
DB1 |
C2 |
DB1 |
C3 |
||
DB2 |
C1 |
DB2 |
C2 |
DB2 |
C3 |
||
DB3 |
C1 |
DB3 |
C2 |
DB3 |
C3 |
||
DB4 |
C1 |
DB4 |
C2 |
DB4 |
C3 |
||
DB5 |
C1 |
DB5 |
C2 |
DB5 |
C3 |
||
DB6 |
C1 |
DB6 |
C2 |
DB6 |
C3 |
||
DB7 |
C1 |
DB7 |
C2 |
DB7 |
C3 |
||
DB8 |
C1 |
DB8 |
C2 |
DB8 |
C3 |
||
DB9 |
C1 |
DB9 |
C2 |
DB9 |
C3 |
||
DB10 |
C2 |
DB10 |
C1 |
DB10 |
C3 |
||
DB11 |
C2 |
DB11 |
C1 |
DB11 |
C3 |
||
DB12 |
C2 |
DB12 |
C1 |
DB12 |
C3 |
||
DB13 |
C2 |
DB13 |
C1 |
DB13 |
C3 |
||
DB14 |
C2 |
DB14 |
C1 |
DB14 |
C3 |
||
DB15 |
C2 |
DB15 |
C1 |
DB15 |
C3 |
||
DB16 |
C2 |
DB16 |
C1 |
DB16 |
C3 |
||
DB17 |
C2 |
DB17 |
C1 |
DB17 |
C3 |
||
DB18 |
C2 |
DB18 |
C1 |
DB18 |
C3 |
Return to top
Network Switch Hardware Summary
The following table summarizes the network switch hardware used in this solution.
Network switch hardware summary
Component | Description |
---|---|
Vendor |
Dell |
Model |
PowerConnect M6220 Ethernet switch |
Ports |
20 (16 internal, 4 external) |
Port bandwidth |
10/100/1000 BASE-T auto-sensing |
Switch fabric capacity |
128 Gbps |
Number per blade enclosures |
2 |
For more information, download a .pdf file about the PowerConnect M6220 Ethernet Switch.
Return to top
Load Balancer Hardware Summary
The following table summarizes the storage hardware used in this solution.
Load balancer hardware summary
Component | Description |
---|---|
Vendor |
F5 |
Model |
BIG-IP 3900 |
Traffic throughput |
4 Gbps |
Hardware SSL |
Included: 500 transactions per second Maximum: 15,000 transactions per second 2.4 Gbps bulk encryption |
Software compression |
Included: 50 Mbps Maximum: 3.8 Gbps |
Processor |
Dual core CPU |
Memory |
8 GB |
Gigabit Ethernet CU ports |
8 |
Gigabit fiber ports (small form-factor pluggable transceiver) |
4 optional LX, SX, or copper |
Power supply |
One 300 watt included with a dual power option |
Typical consumption |
175 watt (110 volt input) |
Return to top
Solution Validation Methodology
Prior to deploying an Exchange solution in a production environment, validate that the solution was designed, sized, and configured properly. This validation must include functional testing to ensure that the system is operating as desired as well as performance testing to ensure that the system can handle the desired user load. This section describes the approach and test methodology used to validate server and storage design for this solution. In particular, the following tests will be defined in detail:
Performance tests
Storage performance validation (Jetstress)
Server performance validation (Loadgen)
Functional tests
Database switchover validation
Server switchover validation
Server failover validation
Datacenter switchover validation
Return to top
Storage Design Validation Methodology
The level of performance and reliability of the storage subsystem connected to the Exchange Mailbox server role has a significant impact on the overall health of the Exchange deployment. Additionally, poor storage performance will result in high transaction latency, primarily reflected in poor client experience when accessing the Exchange system. To ensure the best possible client experience, validate storage sizing and configuration via the method described in this section.
Tool Set
For validating Exchange storage sizing and configuration, we recommend the Microsoft Exchange Server Jetstress tool. The Jetstress tool is designed to simulate an Exchange I/O workload at the database level by interacting directly with the ESE, which is also known as Jet. The ESE is the database technology that Exchange uses to store messaging data on the Mailbox server role. Jetstress can be configured to test the maximum I/O throughput available to your storage subsystem within the required performance constraints of Exchange. Or, Jetstress can accept a target profile of user count and per-user IOPS, and validate that the storage subsystem is capable of maintaining an acceptable level of performance with the target profile. Test duration is adjustable and can be run for a minimal period of time to validate adequate performance or for an extended period of time to additionally validate storage subsystem reliability.
The Jetstress tool can be obtained from the Microsoft Download Center at the following locations:
The documentation included with the Jetstress installer describes how to configure and execute a Jetstress validation test on your server hardware.
Approach to Storage Validation
There are two main types of storage configurations:
Direct-attached storage (DAS) or internal disk scenarios
Storage area network (SAN) scenarios
With DAS or internal disk scenarios, there's only one server accessing the disk subsystem, so the performance capabilities of the storage subsystem can be validated in isolation.
In SAN scenarios, the storage utilized by the solution may be shared by many servers and the infrastructure that connects the servers to the storage may also be a shared dependency. This requires additional testing, as the impact of other servers on the shared infrastructure must be adequately simulated to validate performance and functionality.
Test Cases for Storage Validation
The following storage validation test cases were executed against the solution and should be considered as a starting point for storage validation. Specific deployments may have other validation requirements that can be met with additional testing, so this list isn't intended to be exhaustive:
Validation of worst case database switchover scenario In this test case, the level of I/O is expected to be serviced by the storage subsystem in a worst case switchover scenario (largest possible number of active copies on fewest servers). Depending on whether the storage subsystem is DAS or SAN, this test may be required to run on multiple hosts to ensure that the end-to-end solution load on the storage subsystem can be sustained.
Validation of storage performance under storage failure and recovery scenario (for example, failed disk replacement and rebuild) In this test case, the performance of the storage subsystem during a failure and rebuild scenario is evaluated to ensure that the necessary level of performance is maintained for optimal Exchange client experience. The same caveat applies for a DAS vs. SAN deployment: If multiple hosts are dependent on a shared storage subsystem, the test must include load from these hosts to simulate the entire effect of the failure and rebuild.
Analyzing the Results
The Jetstress tool produces a report file after each test is completed. To help you analyze the report, use the guidelines in Reading Jetstress 2010 Test Reports.
Specifically, you should use the guidelines in the following table when you examine data in the Test Results table of the report.
Jetstress results analysis
Performance counter instance | Guidelines for performance test |
---|---|
I/O Database Reads Average Latency (msec) |
The average value should be less than 20 milliseconds (msec) (0.020 seconds), and the maximum values should be less than 50 msec. |
I/O Log Writes Average Latency (msec) |
Log disk writes are sequential, so average write latencies should be less than 10 msec, with a maximum of no more than 50 msec. |
%Processor Time |
Average should be less than 80%, and the maximum should be less than 90%. |
Transition Pages Repurposed/sec (Windows Server 2003, Windows Server 2008, Windows Server 2008 R2) |
Average should be less than 100. |
The report file shows various categories of I/O performed by the Exchange system:
Transactional I/O Performance This table reports I/O that represents user activity against the database (for example, Outlook generated I/O). This data is generated by subtracting background maintenance I/O and log replication I/O from the total I/O measured during the test. This data provides the actual database IOPS generated along with I/O latency measurements required to determine whether a Jetstress performance test passed or failed.
Background Database Maintenance I/O Performance This table reports the I/O generated due to ongoing ESE database background maintenance.
Log Replication I/O Performance This table reports the I/O generated from simulated log replication.
Total I/O Performance This table reports the total I/O generated during the Jetstress test.
Return to top
Server Design Validation
After the performance and reliability of the storage subsystem is validated, ensure that all of the components in the messaging system are validated together for functionality, performance, and scalability. This means moving up in the stack to validate client software interaction with the Exchange product as well as any server-side products that interact with Exchange. To ensure that the end-to-end client experience is acceptable and that the entire solution can sustain the desired user load, the method described in this section can be applied for server design validation.
Tool Set
For validation of end-to-end solution performance and scalability, we recommend the Microsoft Exchange Server Load Generator tool (Loadgen). Loadgen is designed to produce a simulated client workload against an Exchange deployment. This workload can be used to evaluate the performance of the Exchange system, and can also be used to evaluate the effect of various configuration changes on the overall solution while the system is under load. Loadgen is capable of simulating Microsoft Office Outlook 2007 (online and cached), Office Outlook 2003 (online and cached), POP3, IMAP4, SMTP, ActiveSync, and Outlook Web App (known in Exchange 2007 and earlier versions as Outlook Web Access) client activity. It can be used to generate a single protocol workload, or these client protocols can be combined to generate a multiple protocol workload.
You can get the Loadgen tool from the Microsoft Download Center at the following locations:
The documentation included with the Loadgen installer describes how to configure and execute a Loadgen test against an Exchange deployment.
Approach to Server Validation
When validating your server design, test the worst case scenario under anticipated peak workload. Based on a number of data sets from Microsoft IT and other customers, peak load is generally equal to 2x the average workload throughout the remainder of the work day. This is referred to as the peak-to-average workload ratio.
Peak load
In this Performance Monitor snapshot, which displays various counters that represent the amount of Exchange work being performed over time on a production Mailbox server, the average value for RPC operations per second (the highlighted line) is about 2,386 when averaged across the entire day. The average for this counter during the peak period from 10:00 through 11:00 is about 4,971, giving a peak-to-average ratio of 2.08.
To ensure that the Exchange solution is capable of sustaining the workload generated during the peak average, modify Loadgen settings to generate a constant amount of load at the peak average level, rather than spreading out the workload over the entire simulated work day. Loadgen task-based simulation modules (like the Outlook simulation modules) utilize a task profile that defines the number of times each task will occur for an average user within a simulated day.
The total number of tasks that need to run during a simulated day is calculated as the number of users multiplied by the sum of task counts in the configured task profile. Loadgen then determines the rate at which it should run tasks for the configured set of users by dividing the total number of tasks to run in the simulated day by the simulated day length. For example, if Loadgen needs to run 1,000,000 tasks in a simulated day, and a simulated day is equal to 8 hours (28,800 seconds), Loadgen must run 1,000,000 ÷ 28,800 = 34.72 tasks per second to meet the required workload definition. To increase the amount of load to the desired peak average, divide the default simulated day length (8 hours) by the peak-to-average ratio (2) and use this as the new simulated day length.
Using the task rate example again, 1,000,000 ÷ 14,400 = 69.44 tasks per second. This reduces the simulated day length by half, which results in doubling the actual workload run against the server and achieving our goal of a peak average workload. You don't adjust the run length duration of the test in the Loadgen configuration. The run length duration specifies the duration of the test and doesn't affect the rate at which tasks will be run against the Exchange server.
Test Cases for Server Design Validation
The following server design validation test cases were executed against the solution and should be considered as a starting point for server design validation. Specific deployments may have other validation requirements that can be met with additional testing, so this list isn't intended to be exhaustive:
Normal operating conditions In this test case, the basic design of the solution is validated with all components in their normal operating state (no failures simulated). The desired workload is generated against the solution, and the overall performance of the solution is validated against the metrics that follow.
Single server failure or single server maintenance (in site) In this test case, a single server is taken down to simulate either an unexpected failure of the server or a planned maintenance operation for the server. The workload that would normally be handled by the unavailable server is now handled by other servers in the solution topology, and the overall performance of the solution is validated.
Test Execution and Data Collection
Exchange performance data has some natural variation within test runs and among test runs. We recommend that you take the average of multiple runs to smooth out this variation. For Exchange tested solutions, a minimum of three separate test runs with durations of eight hours was completed. Performance data was collected for the full eight-hour duration of the test. Performance summary data was taken from a three to four hour stable period (excluding the first two hours of the test and the last hour of the test). For each Exchange server role, performance summary data was averaged between servers for each test run, providing a single average value for each data point. The values for each run were then averaged, providing a single data point for all servers of a like server role across all test runs.
Validation of Expected Load
Before you look at any performance counters or start your performance validation analysis, verify that the workload you expected to run matched the workload that you actually ran. Although there are many ways to determine whether the simulated workload matched the expected workload, the easiest and most consistent way is to look at the message delivery rate.
Calculating Expected Peak Message Delivery Rate
Every message profile consists of the sum of the average number of messages sent per day and the average number of messages received per day. To calculate the message delivery rate, select the average number of messages received per day from the following table.
Peak message delivery rate
Message profile | Messages sent per day | Messages received per day |
---|---|---|
50 |
10 |
40 |
100 |
20 |
80 |
150 |
30 |
120 |
200 |
40 |
160 |
The following example assumes that each Mailbox server has 5,000 active mailboxes with a 150 messages per day profile (30 messages sent and 120 messages received per day).
Peak message delivery rate for 5,000 active mailboxes
Description | Calculation | Value |
---|---|---|
Message profile |
Number of messages received per day |
120 |
Number of active mailboxes per Mailbox server |
Not applicable |
5000 |
Total messages received per day per Mailbox server |
5000 × 120 |
600000 |
Total messages received per second per Mailbox server |
600000 ÷ 28800 |
20.83 |
Total messages adjusted for peak load |
20.83 × 2 |
41.67 |
You expect 41.67 messages per second delivered on each Mailbox server running 5,000 active mailboxes with a message profile of 150 messages per day during peak load.
Measuring Actual Message Delivery Rate
The actual message delivery rate can be measured using the following counter on each Mailbox server: MSExchangeIS Mailbox(_Total)\Messages Delivered/sec. If the measured message delivery rate is within one or two messages per second of the target message delivery rate, you can be confident that the desired load profile was run successfully.
Server Validation: Performance and Health Criteria
This section describes the Performance Monitor counters and thresholds used to determine whether the Exchange environment was sized properly and is able to run in a healthy state during extended periods of peak workload. For more information about counters relevant to Exchange performance, see Performance and Scalability Counters and Thresholds.
Hyper-V Root Servers
To validate the performance and health criteria of a Hyper-V root server and the applications running within VMs, you should have a basic understanding of the Hyper-V architecture and how that impacts performance monitoring.
Hyper-V has three main components: the virtualization stack, the hypervisor, and devices. The virtualization stack handles emulated devices, manages VMs, and services I/O. The hypervisor schedules virtual processors, manages interrupts, services timers, and controls other chip-level functions. The hypervisor doesn't handle devices or I/O (for example, there are no hypervisor drivers). The devices are part of the root server or installed in guest servers as part of integration services. Because the root server has a full view of the system and controls the VMs, it also provides monitoring information via Windows Management Instrumentation (WMI) and performance counters.
Processor
When validating physical processor utilization on the root server (or within the guest VM), the standard Processor\% Processor Time counter isn't very useful.
Instead, you can examine the Hyper-V Hypervisor Logical Processor\% Total Run Time counter. This counter shows the percentage of processor time spent in guest and hypervisor runtime and should be used to measure the total processor utilization for the hypervisor and all VMs running on the root server. This counter shouldn't exceed 80 percent or whatever the maximum utilization target you have designed for.
Counter | Target |
---|---|
Hyper-V Hypervisor Logical Processor\% Total Run Time |
<80% |
If you're interested in what percentage of processor time is spent servicing the guest VMs, you can examine the Hyper-V Hypervisor Logical Processor\% Guest Run Time counter. If you're interested in what percentage of processor time is spent in hypervisor, you can look at the Hyper-V Hypervisor Logical Processor\% Hypervisor Run Time counter. This counter should be below 5 percent. The Hyper-V Hypervisor Root Virtual Processor\% Guest Run Time counter shows the percentage of processor time spent in the virtualization stack. This counter should also be below 5 percent. These two counters can be used to determine what percentage of your available physical processor time is being used to support virtualization.
Counter | Target |
---|---|
Hyper-V Hypervisor Logical Processor\% Guest Run Time |
<80% |
Hyper-V Hypervisor Logical Processor\% Hypervisor Run Time |
<5% |
Hyper-V Hypervisor Root Virtual Processor\% Guest Run Time |
<5% |
Memory
You need to ensure that your Hyper-V root server has enough memory to support the memory allocated to VMs. Hyper-V automatically reserves 512 MB (this may vary with different Hyper-V releases) for the root operating system. If you don't have enough memory, Hyper-V will prevent the last VM from starting. In general, don't worry about validating the memory on a Hyper-V root server. Be more concerned with ensuring that sufficient memory is allocated to the VMs to support the Exchange roles.
Application Health
An easy way to determine whether all the VMs are in a healthy state is to look at the Hyper-V Virtual Machine Health Summary counters.
Counter | Target |
---|---|
Hyper-V Virtual Machine Health Summary\Health OK |
1 |
Hyper-V Virtual Machine Health Summary\Health Critical |
0 |
Mailbox Servers
When validating whether a Mailbox server was properly sized, focus on processor, memory, storage, and Exchange application health. This section describes the approach to validating each of these components.
Processor
During the design process, you calculated the adjusted megacycle capacity of the server or processor platform. You then determined the maximum number of active mailboxes that could be supported by the server without exceeding 80 percent of the available megacycle capacity. You also determined what the projected CPU utilization should be during normal operating conditions and during various server maintenance or failure scenarios.
During the validation process, verify that the worst case scenario workload doesn't exceed 80 percent of the available megacycles. Also, verify that actual CPU utilization is close to the expected CPU utilization during normal operating conditions and during various server maintenance or failure scenarios.
For physical Exchange deployments, use the Processor(_Total)\% Processor Time counter and verify that this counter is less than 80 percent on average.
Counter | Target |
---|---|
Processor(_Total)\% Processor Time |
<80% |
For virtual Exchange deployments, the Processor(_Total)\% Processor Time counter is measured within the VM. In this case, the counter isn't measuring the physical CPU utilization. It's measuring the utilization of the virtual CPU provided by the hypervisor. Therefore, it doesn't provide an accurate reading of the physical processor and shouldn't be used for design validation purposes. For more information, see Hyper-V: Clocks lie... which performance counters can you trust.
For validating Exchange deployments running on Microsoft Hyper-V, use the Hyper-V Hypervisor Virtual Processor\% Guest Run Time counter. This provides a more accurate value for the amount of physical CPU being utilized by the guest operating system. This counter should be less than 80 percent on average.
Counter | Target |
---|---|
Hyper-V Hypervisor Virtual Processor\% Guest Run Time |
<80% |
Memory
During the design process, you calculated the amount of database cache required to support the maximum number of active databases on each Mailbox server. You then determined the optimal physical memory configuration to support the database cache and system memory requirements.
Validating whether an Exchange Mailbox server has sufficient memory to support the target workload isn't a simple task. Using available memory counters to view how much physical memory is remaining isn't helpful because the memory manager in Exchange is designed to use almost all of the available physical memory. The information store (store.exe) reserves a large portion of physical memory for database cache. The database cache is used to store database pages in memory. When a page is accessed in memory, the information doesn't have to be retrieved from disk, reducing read I/O. The database cache is also used to optimize write I/O.
When a database page is modified (known as a dirty page), the page stays in cache for a period of time. The longer it stays in cache, the better the chance that the page will be modified multiple times before those changes are written to the disk. Keeping dirty pages in cache also causes multiple pages to be written to the disk in the same operation (known as write coalescing). Exchange uses as much of the available memory in the system as possible, which is why there aren't large amounts of available memory on an Exchange Mailbox server.
It may not be easy to know whether the memory configuration on your Exchange Mailbox server is undersized. For the most part, the Mailbox server will still function, but your I/O profile may be much higher than expected. Higher I/O can lead to higher disk read and write latencies, which may impact application health and client user experience. In the results section, there isn't any reference to memory counters. Potential memory issues will be identified in the storage validation and application health result sections, where memory-related issues are more easily detected.
Storage
If you have performance issues with your Exchange Mailbox server, those issues may be storage-related issues. Storage issues may be caused by having an insufficient number of disks to support the target I/O requirements, having overloaded or poorly designed storage connectivity infrastructure, or by factors that change the target I/O profile like insufficient memory, as discussed previously.
The first step in storage validation is to verify that the database latencies are below the target thresholds. In previous releases, logical disk counters determined disk read and write latency. In Exchange 2010, the Exchange Mailbox server that you are monitoring is likely to have a mix of active and passive mailbox database copies. The I/O characteristics of active and passive database copies are different. Because the size of the I/O is much larger on passive copies, there are typically much higher latencies on passive copies. Latency targets for passive databases are 200 msec, which is 10 times higher than targets on active database copies. This isn't much of a concern because high latencies on passive databases have no impact on client experience. But if you are using the traditional logical disk counters to measure latencies, you must review the individual volumes and separate volumes containing active and passive databases. Instead, we recommend that you use the new MSExchange Database counters in Exchange 2010.
When validating latencies on Exchange 2010 Mailbox servers, we recommend you use the counters in the following table for active databases.
Counter | Target |
---|---|
MSExchange Database\I/O Database Reads (Attached) Average Latency |
<20 msec |
MSExchange Database\I/O Database Writes (Attached) Average Latency |
<20 msec |
MSExchange Database\IO Log Writes Average Latency |
<1 msec |
We recommend that you use the counters in the following table for passive databases
Counter | Target |
---|---|
MSExchange Database\I/O Database Reads (Recovery) Average Latency |
<200 msec |
MSExchange Database\I/O Database Writes (Recovery) Average Latency |
<200 msec |
MSExchange Database\IO Log Read Average Latency |
<200 msec |
Note
To view these counters in Performance Monitor, you must enable the advanced database counters. For more information, see How to Enable Extended ESE Performance Counters.
When you're validating disk latencies for Exchange deployments running on Microsoft Hyper-V, be aware that the I/O Database Average Latency counters (as with many time-based counters) may not be accurate because the concept of time within the VM is different than on the physical server. The following example shows that the I/O Database Reads (Attached) Average Latency is 22.8 in the VM and 17.3 on a physical server for the same simulated workload. If the values of time-based counters are over the target thresholds, your server may be running correctly. Review all health criteria to make a decision regarding server health when your Mailbox server role is deployed within a VM.
Values of disk latency counters for virtual and physical Mailbox servers
Counter | Virtual Mailbox server | Physical Mailbox server |
---|---|---|
MSExchange Database/ |
||
I/O Database Reads (Attached) / Average Latency |
22.792 |
17.250 |
I/O Database Reads (Attached) / sec |
17.693 |
18.131 |
I/O Database Reads (Recovery) / Average Latency |
34.215 |
27.758 |
I/O Database Writes (Recovery) / sec |
10.829 |
8.483 |
I/O Database Writes (Attached) / Average Latency |
0.944 |
0.411 |
I/O Database Writes (Attached) / sec |
10.184 |
10.963 |
MSExchangeIS |
||
RPC Averaged Latency |
1.966 |
1.695 |
RPC Operations / sec |
334.371 |
341.139 |
RPC Packets / sec |
180.656 |
183.360 |
MSExchangeIS Mailbox |
||
Messages Delivered / sec |
2.062 |
2.065 |
Messages Sent / sec |
0.511 |
0.514 |
In addition to disk latencies, review the Database\Database Page Fault Stalls/sec counter. This counter indicates the rate of page faults that can't be serviced because there are no pages available for allocation from the database cache. This counter should be 0 on a healthy server.
Counter | Target |
---|---|
Database\Database Page Fault Stalls/sec |
<1 |
Also, review the Database\Log Record Stalls/sec counter, which indicates the number of log records that can't be added to the log buffers per second because the log buffers are full. This counter should average less than 10.
Counter | Target |
---|---|
Database\Log Record Stalls/sec |
<10 |
Exchange Application Health
Even if there are no obvious issues with processor, memory, and disk, we recommend that you monitor the standard application health counters to ensure that the Exchange Mailbox server is in a healthy state.
The MSExchangeIS\RPC Averaged Latency counter provides the best indication of whether other counters with high database latencies are actually impacting Exchange health and client experience. Often, high RPC averaged latencies are associated with a high number of RPC requests, which should be less than 70 at all times.
Counter | Target |
---|---|
MSExchangeIS\RPC Averaged Latency |
<10 msec on average |
MSExchangeIS\RPC Requests |
<70 at all times |
Next, make sure that the transport layer is healthy. Any issues in transport or issues downstream of transport affecting the transport layer can be detected with the MSExchangeIS Mailbox(_Total)\Messages Queued for Submission counter. This counter should be less than 50 at all times. There may be temporary increases in this counter, but the counter value shouldn't grow over time and shouldn't be sustained for more than 15 minutes.
Counter | Target |
---|---|
MSExchangeIS Mailbox(_Total)\Messages Queued for Submission |
<50 at all times |
Next, ensure that maintenance of the database copies is in a healthy state. Any issues with log shipping or log replay can be identified using the MSExchange Replication(*)\CopyQueueLength and MSExchange Replication(*)\ReplayQueueLength counters. The copy queue length shows the number of transaction log files waiting to be copied to the passive copy log file folder and should be less than 1 at all times. The replay queue length shows the number of transaction log files waiting to be replayed into the passive copy and should be less than 5. Higher values don't impact client experience, but result in longer store mount times when a handoff, failover, or activation is performed.
Counter | Target |
---|---|
MSExchange Replication(*)\CopyQueueLength |
<1 |
MSExchange Replication(*)\ReplayQueueLength |
<5 |
Client Access Servers
To determine whether a Client Access server is healthy, review processor, memory, and application health. For an extended list of important counters, see Client Access Server Counters.
Processor
For physical Exchange deployments, use the Processor(_Total)\% Processor Time counter. This counter should be less than 80 percent on average.
Counter | Target |
---|---|
Processor(_Total)\% Processor Time |
<80% |
For validating Exchange deployments running on Microsoft Hyper-V, use the Hyper-V Hypervisor Virtual Processor\% Guest Run Time counter. This provides an accurate value for the amount of physical CPU being utilized by the guest operating system. This counter should be less than 80 percent on average.
Counter | Target |
---|---|
Hyper-V Hypervisor Virtual Processor\% Guest Run Time |
<80% |
Application Health
To determine whether the MAPI client experience is acceptable, use the MSExchange RpcClientAccess\RPC Averaged Latency counter. This counter should be below 250 msec. High latencies can be associated with a large number of RPC requests. The MSExchange RpcClientAccess\RPC Requests counter should be below 40 on average.
Counter | Target |
---|---|
MSExchange RpcClientAccess\RPC Averaged Latency |
<250 msec |
MSExchange RpcClientAccess\RPC Requests |
<40 |
Transport Servers
To determine whether a transport server is healthy, review processor, disk, and application health. For an extended list of important counters, see Transport Server Counters.
Processor
For physical Exchange deployments, use the Processor(_Total)\% Processor Time counter. This counter should be less than 80 percent on average.
Counter | Target |
---|---|
Processor(_Total)\% Processor Time |
<80% |
For validating Exchange deployments running on Microsoft Hyper-V, use the Hyper-V Hypervisor Virtual Processor\% Guest Run Time counter. This provides an accurate value for the amount of physical CPU being utilized by the guest operating system. This counter should be less than 80 percent on average.
Counter | Target |
---|---|
Hyper-V Hypervisor Virtual Processor\% Guest Run Time |
<80% |
Disk
To determine whether disk performance is acceptable, use the Logical Disk(*)\Avg. Disk sec/Read and Write counters for the volumes containing the transport logs and database. Both of these counters should be less than 20 msec.
Counter | Target |
---|---|
Logical Disk(*)\Avg. Disk sec/Read |
<20 msec |
Logical Disk(*)\Avg. Disk sec/Write |
<20 msec |
Application Health
To determine whether a Hub Transport server is sized properly and running in a healthy state, examine the MSExchangeTransport Queues counters outlined in the following table. All of these queues will have messages at various times. You want to ensure that the queue length isn't sustained and growing over a period of time. If larger queue lengths occur, this could indicate an overloaded Hub Transport server. Or, there may be network issues or an overloaded Mailbox server that's unable to receive new messages. You will need to check other components of the Exchange environment to verify.
Counter | Target |
---|---|
MSExchangeTransport Queues(_total)\Aggregate Delivery |
<3000 |
MSExchangeTransport Queues(_total)\Active Remote Delivery Queue Length |
<250 |
MSExchangeTransport Queues(_total)\Active Mailbox Delivery Queue Length |
<250 |
MSExchangeTransport Queues(_total)\Retry Mailbox Delivery Queue Length |
<100 |
MSExchangeTransport Queues(_total)\Submission Queue Length |
<100 |
Return to top
Functional Validation Tests
You can use the information in the following sections for functional validation tests.
Database Switchover Validation
A database switchover is the process by which an individual active database is switched over to another database copy (a passive copy), and that database copy is made the new active database copy. Database switchovers can happen both within and across datacenters. A database switchover can be performed by using the Exchange Management Console (EMC) or the Exchange Management Shell.
To validate that a passive copy of a database can be successfully activated on another server, run the following command.
Move-ActiveMailboxDatabase <DatabaseName> -ActivateOnServer <TargetServer>
Success criteria: The active mailbox database is mounted on the specified target server. This result can be confirmed by running the following command.
Get-MailboxDatabaseCopyStatus <DatabaseName>
Server Switchover Validation
A server switchover is the process by which all active databases on a DAG member are activated on one or more other DAG members. Like database switchovers, a server switchover can occur both within a datacenter and across datacenters, and it can be initiated by using both the EMC and the Shell.
To validate that all passive copies of databases on a server can be successfully activated on other servers hosting a passive copy, run the following command.
Get-MailboxDatabase -Server <ActiveMailboxServer> | Move-ActiveMailboxDatabase -ActivateOnServer <TargetServer>
Success criteria: The active mailbox databases are mounted on the specified target server. This can be confirmed by running the following command.
Get-MailboxDatabaseCopyStatus <DatabaseName>
To validate that one copy of each of the active databases will be successfully activated on another Mailbox server hosting passive copies of the databases, shut down the server by performing the following action.
Turn off the current active server.
Success criteria: The active mailbox databases are mounted on another Mailbox server in the DAG. This can be confirmed by running the following command.
Get-MailboxDatabaseCopyStatus <DatabaseName>
Server Failover Validation
A server failover occurs when the DAG member can no longer service the MAPI network, or when the Cluster service on a DAG member can no longer contact the remaining DAG members.
To validate that one copy of each of the active databases will be successfully activated on another Mailbox server hosting passive copies of the databases, turn off the server by performing one of the following actions:
Press and hold the power button on the server until the server turns off.
Pull the power cables from the server, which results in the server turning off.
Success criteria: The active mailbox databases are mounted on another Mailbox server in the DAG. This can be confirmed by running the following command.
Get-MailboxDatabase -Server <MailboxServer> | Get-MailboxDatabaseCopyStatus
Return to top
Datacenter Switchover Validation
A datacenter or site failure is managed differently from the types of failures that can cause a server or database failover. In a high availability configuration, automatic recovery is initiated by the system, and the failure typically leaves the messaging system in a fully functional state. By contrast, a datacenter failure is considered to be a disaster recovery event, and as such, recovery must be manually performed and completed for the client service to be restored and for the outage to end. The process you perform is called a datacenter switchover. As with many disaster recovery scenarios, prior planning and preparation for a datacenter switchover can simplify your recovery process and reduce the duration of your outage.
For more information, including detailed steps for performing a datacenter switchover, see Datacenter Switchovers.
There are four basic steps that you complete to perform a datacenter switchover, after making the initial decision to activate the second datacenter:
Terminate a partially running datacenter.
Validate and confirm the prerequisites for the second datacenter.
Activate the Mailbox servers.
Activate the Client Access servers.
The following describes the steps used to validate a datacenter switchover.
Terminate Partially Failed Datacenter (Assuming DAG Is in DAC Mode)
When the DAG is in DAC mode, the specific actions to terminate any surviving DAG members in the primary datacenter depend on the state of the failed datacenter. Perform one of the following:
If the Mailbox servers in the failed datacenter are still accessible (usually not the case), run the following command on each Mailbox server.
Stop-DatabaseAvailabilityGroup -ActiveDirectorySite <insertsitename>
If the Mailbox servers in the failed datacenter are unavailable but Active Directory is operating in the primary datacenter, run the following command on a domain controller.
Stop-DatabaseAvailabilityGroup -ActiveDirectorySite <insertsitename> -ConfigurationOnly
Note
Failure to either turn off the Mailbox servers in the failed datacenter or to successfully perform the Stop-DatabaseAvailabilityGroup command against the servers will create the potential for split brain syndrome to occur across the two datacenters. You may need to individually turn off computers through power management devices to satisfy this requirement.
Success criteria: All Mailbox servers in the failed site are in a stopped state. You can verify this by running the following command from a server in the failed datacenter.
Get-DatabaseAvailabilityGroup | Format-List
Validate and Confirm Prerequisites for Secondary Datacenter
The second datacenter must be updated to represent which primary datacenter servers are stopped. From a server in the secondary datacenter, run the following command.
Stop-DatabaseAvailabilityGroup -ActiveDirectorySite <insertsitename> -ConfigurationOnly
The purpose of this step is to inform the servers in the secondary datacenter about which Mailbox servers are available to use when restoring service.
Success criteria: All Mailbox servers in the failed datacenter are in a stopped state. To verify this, run the following command from a server in the secondary datacenter.
Get-DatabaseAvailabilityGroup | Format-List
Activate Mailbox Servers (Assuming DAG Is in DAC Mode)
Before activating the DAG members in the secondary datacenter, we recommend that you verify that the infrastructure services in the secondary datacenter are ready for messaging service activation.
When the DAG is in DAC mode, the steps to complete activation of the Mailbox servers in the second datacenter are as follows:
Stop the cluster service on each DAG member in the secondary datacenter. You can use the Stop-Service cmdlet to stop the service (for example, Stop-Service ClusSvc), or use net stop clussvc from an elevated command prompt.
To activate the Mailbox servers in the secondary datacenter, run the following command.
Restore-DatabaseAvailabilityGroup -Identity <DAGname> -ActiveDirectorySite <insertsitename>
If this command succeeds, the quorum criteria are shrunk to the servers in the secondary datacenter. If the number of servers in that datacenter is an even number, the DAG will switch to using the alternate witness server as identified by the setting on the DAG object.
To activate the databases, run one of the following commands.
Get-MailboxDatabase <insertcriteriatoselectDBs> | Move-ActiveMailboxDatabase -ActivateOnServer <DAGMemberinPrimarySite>
or
Move-ActiveMailboxDatabase -Server <DAGMemberInSecondarySite> -ActivateOnServer <DAGMemberinPrimarySite>
Check the event logs and review all error and warning messages to ensure that the secondary site is healthy. Any indicated issues should be followed up and corrected prior to mounting the databases.
To mount the databases, run the following command.
Get-MailboxDatabase <DAGMemberInSecondarySite> | Mount-Database
Success criteria: The active mailbox databases are mounted on Mailbox servers in the secondary site. To confirm, run the following command.
Get-MailboxDatabaseCopyStatus <DatabaseName>
Activate Client Access Servers
Clients connect to service endpoints to access Exchange services and data. Activating Internet-facing Client Access servers therefore involves changing DNS records to point to the new IP addresses that will be configured for the new service endpoints. Clients will then automatically connect to the new service endpoints in one of two ways:
Clients will continue to try to connect, and should automatically connect after Time to Live (TTL) has expired for the original DNS entry, and after the entry is expired from the client's DNS cache. Users can also run the ipconfig /flushdns command from a command prompt to manually clear their DNS cache. If using Outlook Web App, the Web browser may need to be closed and restarted to clear the DNS cache used by the browser. In Exchange 2010 SP1, this browser caching issue can be mitigated by configuring the FailbackURL parameter on the Outlook Web App virtual directory owa.
Clients starting or restarting will perform a DNS lookup on startup and will get the new IP address for the service endpoint, which will be a Client Access server or array in the second datacenter.
To validate the scenario with Loadgen, perform the following actions:
Change the DNS entry for the Client Access server array to point to the virtual IP address of the hardware load balancing in the secondary site.
Run the ipconfig /flushdns command on all Loadgen servers.
Restart the Loadgen test.
Verify that the Client Access servers in the secondary site are now servicing the load.
To validate the scenario with an Outlook 2007 client, perform the following:
Change the DNS entry for the Client Access server array to point to the VIP of the hardware load balancing in the secondary site.
Run the ipconfig /flushdns command on the client or wait until TTL expires.
Wait for the Outlook client to reconnect.
Return to top
Primary Datacenter Service Restoration Validation
The process of restoring service to a previously failed datacenter is referred to as a failback. The steps used to perform a datacenter failback are similar to the steps used to perform a datacenter switchover. A significant distinction is that datacenter failbacks are scheduled, and the duration of the outage is often much shorter.
It's important that failback not be performed until the infrastructure dependencies for Exchange have been reactivated, are functioning and stable, and have been validated. If these dependencies aren't available or healthy, it's likely that the failback process will cause a longer than necessary outage, and it's possible the process could fail altogether.
Mailbox Server Role Failback (Assuming DAG Is in DAC Mode)
The Mailbox server role should be the first role that's failed back to the primary datacenter. The following steps detail the Mailbox server role failback process.
To reincorporate the DAG members in the primary site, run the following command.
Start-DatabaseAvailabilityGroup -Identity <DatabaseAvailabilityGroupIdParameter> -ActiveDirectorySite <insertsitename>
To verify the state of the database copies in the primary datacenter, run the following command.
Get-MailboxDatabaseCopyStatus
After the Mailbox servers in the primary datacenter have been incorporated into the DAG, they will need some time to synchronize their database copies. Depending on the nature of the failure, the length of the outage, and actions taken by an administrator during the outage, this may require reseeding the database copies. For example, if during the outage, you remove the database copies from the failed primary datacenter to allow log file truncation to occur for the surviving active copies in the secondary datacenter, reseeding will be required. At this time, each database can be synchronized individually. After a replicated database copy in the primary datacenter is healthy, you can proceed to the next step.
During the datacenter switchover process, the DAG was configured to use an alternate witness server. To reconfigure the DAG to use a witness server in the primary datacenter, run the following command.
Set-DatabaseAvailabilityGroup -Identity <DAGName> -WitnessServer <PrimaryDatacenterWitnessServer>
The databases being reactivated in the primary datacenter should now be dismounted in the secondary datacenter. Run the following command.
Get-MailboxDatabase | Dismount-Database
After the databases have been dismounted, the Client Access server URLs should be moved from the secondary datacenter to the primary datacenter. To do this, change the DNS record for the URLs to point to the Client Access server or array in the primary datacenter.
Important
Don't proceed to the next step until the Client Access server URLs have been moved and the DNS TTL and cache entries have expired. Activating the databases in the primary datacenter prior to moving the Client Access server URLs to the primary datacenter will result in an invalid configuration (for example, a mounted database that has no Client Access servers in its Active Directory site).
To activate the databases, run one of the following commands.
Get-MailboxDatabase <insertcriteriatoselectDBs> | Move-ActiveMailboxDatabase -ActivateOnServer <DAGMemberinSecondSite>
or
Move-ActiveMailboxDatabase -Server <DAGMemberinPrimarySite> -ActivateOnServer <DAGMemberinSecondSite>
To mount the databases, run the following command.
Get-MailboxDatabase <insertcriteriatoselectDBs> | Mount-Database
Success criteria: The active mailbox databases are successfully mounted on Mailbox servers in the primary site. To confirm, run the following command.
Get-MailboxDatabaseCopyStatus <DatabaseName>
Return to top
Storage Design Validation Results
The following tables summarize the Jetstress storage validation results. This solution achieved higher than target transactional I/O while maintaining database latencies well under the 20 msec target.
Overall test result |
Pass |
Overall throughput
Overall throughput | Result |
---|---|
Target transactional I/O per second |
383 |
Achieved transactional I/O per second |
540 |
Transactional I/O performance: database reads
Database | I/O database reads per second | I/O database reads average latency (msec) |
---|---|---|
Instance1 |
42.2 |
18.9 |
Instance2 |
42.7 |
17.9 |
Instance3 |
42.9 |
17.4 |
Instance4 |
42.0 |
17.9 |
Instance5 |
42.0 |
18.0 |
Instance6 |
41.8 |
17.0 |
Instance7 |
42.8 |
17.7 |
Instance8 |
42.6 |
17.4 |
Transactional I/O performance: database writes
Database | I/O database writes per second | I/O database writes average latency (msec) |
---|---|---|
Instance1 |
25.9 |
25.9 |
Instance2 |
26.4 |
25.1 |
Instance3 |
26.4 |
21.7 |
Instance4 |
26.1 |
22.6 |
Instance5 |
25.9 |
23.8 |
Instance6 |
25.5 |
19.8 |
Instance7 |
26.3 |
21.2 |
Instance8 |
26.5 |
18.5 |
Transactional I/O performance: log writes
Database | I/O log writes per second | I/O database writes average latency (msec) |
---|---|---|
Instance1 |
23.8 |
3.8 |
Instance2 |
23.7 |
3.7 |
Instance3 |
24.0 |
3.3 |
Instance4 |
23.5 |
3.8 |
Instance5 |
23.7 |
3.8 |
Instance6 |
23.7 |
3.5 |
Instance7 |
23.7 |
3.7 |
Instance8 |
24.3 |
3.3 |
Return to top
Server Design Validation Results
The following sections summarize the server design validation results for the test cases.
Test Case: Normal Operating Conditions
The first test case represents peak workload during normal operating conditions. Normal operating conditions refer to a state where all of the active and passive databases reside on the servers they were planned to run on. Because this test case doesn't represent the worst case workload, it isn't the key performance validation test. It provides a good indication of how this environment should run outside of a server failure or maintenance event. In this case each Mailbox server is running four active and four passive databases.
Validation of Expected Load
The message delivery rate verifies that tested workload matched the target workload. The actual message delivery rate is slightly higher than target.
Counter | Target | Tested result |
---|---|---|
Message Delivery Rate / Server |
8.54 |
8.63 |
Validation of Mailbox Servers
Processor
Processor utilization is low as expected.
Counter | Target | Tested result |
---|---|---|
Hyper-V Hypervisor Virtual Processor\% Guest Run Time |
<70% |
54 |
Storage
The storage results look good. The average read latency for the active databases is 19.3 when measured in the VM and 15.9 when measured on the EqualLogic storage array. As discussed in "Server Validation: Performance and Health Criteria" earlier in this white paper, time-based counters measured in a VM may not be accurate because the VM has a different concept of time than the physical server. The difference between these counters is likely the result of a combination of iSCSI network latency (generally <1 msec) and inaccurate counter values in the VM.
Counter | Target | Tested result |
---|---|---|
MSExchange Database\I/O Database Reads (Attached) Average Latency |
<20 msec |
19.3 |
EqualLogic Average Disk Read Latency |
<20 msec |
15.9 |
MSExchange Database\I/O Database Writes (Attached) Average Latency |
<20 msec <Reads average |
6.8 |
EqualLogic Average Disk Write Latency |
<20 msec |
2.5 |
Database\Database Page Fault Stalls/sec |
0 |
0 |
MSExchange Database\IO Log Writes Average Latency |
<20 msec |
5.2 |
Database\Log Record Stalls/sec |
0 |
0 |
MSExchange Database\I/O Database Reads (Recovery) Average Latency |
<200 msec |
23.7 |
MSExchange Database\I/O Database Writes (Recovery) Average Latency |
<200 msec |
7.6 |
MSExchange Database\IO Log Read Average Latency |
<200 msec |
7.5 |
Application Health
Exchange is healthy, and all of the counters used to determine application health are well under target values.
Counter | Target | Tested result |
---|---|---|
MSExchangeIS\RPC Requests |
<70 |
2.7 |
MSExchangeIS\RPC Averaged Latency |
<10 msec |
2.4 |
MSExchangeIS Mailbox(_Total)\Messages Queued for Submission |
0 |
1.5 |
MSExchange Replication(*)\CopyQueueLength |
<1 |
0.1 |
MSExchange Replication(*)\ReplayQueueLength |
<5 |
2.1 |
Validation of Client Access and Hub Transport Servers
Processor
Processor utilization is low, as expected.
Counter | Target | Tested result |
---|---|---|
Hyper-V Hypervisor Virtual Processor\% Guest Run Time |
<70% |
19 |
Storage
The storage results look good. The very low latencies should have no impact on message transport.
Counter | Target | Tested result |
---|---|---|
Logical/Physical Disk(*)\Avg. Disk sec/Read |
<20 msec |
0.012 |
Logical/Physical Disk(*)\Avg. Disk sec/Write |
<20 msec |
0.012 |
Application Health
The low RPC Averaged Latency values confirm a healthy Client Access server with no impact on client experience.
Counter | Target | Tested result |
---|---|---|
MSExchange RpcClientAccess\RPC Averaged Latency |
<250 msec |
9 |
MSExchange RpcClientAccess\RPC Requests |
<40 |
2 |
Hub Transport Server Health
The Transport Queue counters are all well under target, confirming that the Hub Transport server is healthy and able to process and deliver the required messages.
Counter | Target | Tested result |
---|---|---|
\MSExchangeTransport Queues(_total)\Aggregate Delivery Queue Length (All Queues) |
<3000 |
1.5 |
\MSExchangeTransport Queues(_total)\Active Remote Delivery Queue Length |
<250 |
0 |
\MSExchangeTransport Queues(_total)\Active Mailbox Delivery Queue Length |
<250 |
1.1 |
\MSExchangeTransport Queues(_total)\Submission Queue Length |
<100 |
0 |
\MSExchangeTransport Queues(_total)\Retry Mailbox Delivery Queue Length |
<100 |
0.4 |
Validation of Root Server Health
Processor
As expected, the processor utilization is very low and well under target thresholds.
Counter | Target | Tested result |
---|---|---|
Hyper-V Hypervisor Logical Processor(_total)\% Guest Run Time |
<75% |
42 |
Hyper-V Hypervisor Logical Processor(_total)\% Hypervisor Run Time |
<5% |
2 |
Hyper-V Hypervisor Logical Processor(_total)\% Total Run Time |
<80% |
44 |
Hyper-V Hypervisor Root Virtual Processor(_total)\% Guest Run Time |
<5% |
6 |
Application Health
The VM health summary counters indicate that all VMs are in a healthy state.
Counter | Target | Tested result |
---|---|---|
Hyper-V Virtual Machine Health Summary\Health Critical |
0 |
0 |
Test Case: Single Server Failure or Single Server Maintenance (In Site)
Validation of Expected Load
The message delivery rate verifies that tested workload matched the target workload. The actual message delivery rate is slightly higher than target.
Counter | Target | Tested result |
---|---|---|
Message Delivery Rate / Mailbox |
17.08 |
17.3 |
Validation of Mailbox Servers
Processor
Processor utilization is low, as expected.
Counter | Target | Tested result |
---|---|---|
Hyper-V Hypervisor Virtual Processor\% Guest Run Time |
<70% |
69 |
Storage
In this test case, the average read latency for the active databases is 26.2 when measured in the VM and 16.2 when measured on the EqualLogic storage array. As discussed in "Server Validation: Performance and Health Criteria" earlier in this white paper, time-based counters measured in a VM may not be accurate because the VM has a different concept of time than the physical server. The difference between these counters is likely the result of a combination of iSCSI network latency (generally <1 msec) and inaccurate counter values in the VM. Because the read latency measured on the EqualLogic array is less than 20, there's no concern about the counter measured in the VM being over target.
Counter | Target | Tested result |
---|---|---|
MSExchange Database\I/O Database Reads (Attached) Average Latency |
<20 msec |
26.2 |
EqualLogic Average Disk Read Latency |
<20 msec |
16.2 |
MSExchange Database\I/O Database Writes (Attached) Average Latency |
<20 msec <Reads average |
7.4 |
EqualLogic Average Disk Write Latency |
<20 msec |
2.1 |
Database\Database Page Fault Stalls/sec |
0 |
0 |
MSExchange Database\IO Log Writes Average Latency |
<20 msec |
5.2 |
Database\Log Record Stalls/sec |
0 |
0 |
MSExchange Database\I/O Database Reads (Recovery) Average Latency |
<200 msec |
Not applicable |
MSExchange Database\I/O Database Writes (Recovery) Average Latency |
<200 msec |
Not applicable |
MSExchange Database\IO Log Read Average Latency |
<200 msec |
Not applicable |
Application Health
Exchange is very healthy, and all of the counters used to determine application health are well under target values.
Counter | Target | Tested result |
---|---|---|
MSExchangeIS\RPC Requests |
<70 |
8.0 |
MSExchangeIS\RPC Averaged Latency |
<10 msec |
3.7 |
MSExchangeIS Mailbox(_Total)\Messages Queued for Submission |
0 |
3.3 |
MSExchange Replication(*)\CopyQueueLength |
<1 |
Not applicable |
MSExchange Replication(*)\ReplayQueueLength |
<5 |
Not applicable |
Validation of Client Access and Hub Transport Servers
Processor
Processor utilization is low as expected.
Counter | Target | Tested result |
---|---|---|
Hyper-V Hypervisor Virtual Processor\% Guest Run Time |
<70% |
26.3 |
Storage
The storage results look good. The very low latencies should have no impact on message transport.
Counter | Target | Tested result |
---|---|---|
Logical/Physical Disk(*)\Avg. Disk sec/Read |
<20 msec |
0.0041 |
Logical/Physical Disk(*)\Avg. Disk sec/Write |
<20 msec |
0.0005 |
Application Health
The low RPC Averaged Latency values confirm a healthy Client Access server with no impact on client experience.
Counter | Target | Tested result |
---|---|---|
MSExchange RpcClientAccess\RPC Averaged Latency |
<250 msec |
13.2 |
MSExchange RpcClientAccess\RPC Requests |
<40 |
6.1 |
The Transport Queue counters are all well under target, confirming that the Hub Transport server is healthy and able to process and deliver the required messages.
Counter | Target | Tested result |
---|---|---|
\MSExchangeTransport Queues(_total)\Aggregate Delivery Queue Length (All Queues) |
<3000 |
4.7 |
\MSExchangeTransport Queues(_total)\Active Remote Delivery Queue Length |
<250 |
0 |
\MSExchangeTransport Queues(_total)\Active Mailbox Delivery Queue Length |
<250 |
3.6 |
\MSExchangeTransport Queues(_total)\Submission Queue Length |
<100 |
0 |
\MSExchangeTransport Queues(_total)\Retry Mailbox Delivery Queue Length |
<100 |
1.1 |
Validation of Root Server Health
Processor
As expected the processor utilization is very low and well under target thresholds.
Counter | Target | Tested result |
---|---|---|
Hyper-V Hypervisor Logical Processor(_total)\% Guest Run Time |
<75% |
49.9 |
Hyper-V Hypervisor Logical Processor(_total)\% Hypervisor Run Time |
<5% |
1.3 |
Hyper-V Hypervisor Logical Processor(_total)\% Total Run Time |
<80% |
51.2 |
Hyper-V Hypervisor Root Virtual Processor(_total)\% Guest Run Time |
<5% |
3.6 |
Application Health
The VM health summary counters indicate that all VMs are in a healthy state.
Counter | Target | Tested result |
---|---|---|
Hyper-V Virtual Machine Health Summary\Health Critical |
0 |
0 |
Test Case: Site Failure
The message delivery rate verifies that tested workload matched the target workload. The actual message delivery rate is slightly higher than target.
Validation of Expected Load
Counter | Target | Tested result |
---|---|---|
Message Delivery Rate / Mailbox |
17.08 |
17.4 |
Validation of Mailbox Servers
Processor
Processor utilization is low as expected.
Counter | Target | Tested result |
---|---|---|
Hyper-V Hypervisor Virtual Processor\% Guest Run Time |
<70% |
64% |
Storage
In this test case, the average read latency for the active databases is 24.0 when measured in the VM and 15.9 when measured on the EqualLogic storage array. As discussed in "Server Validation: Performance and Health Criteria" earlier in this white paper, time-based counters measured in a VM may not be accurate because the VM has a different concept of time than the physical server. The difference between these counters is likely the result of a combination of iSCSI network latency (generally <1 msec) and inaccurate counter values in the VM. Because the read latency measured on the EqualLogic array is less than 20, there's no concern about the counter measured in the VM being over target.
Counter | Target | Tested result |
---|---|---|
MSExchange Database\I/O Database Reads (Attached) Average Latency |
<20 msec |
24.0 |
EqualLogic Average Disk Read Latency |
<20 msec |
15.9 |
MSExchange Database\I/O Database Writes (Attached) Average Latency |
<20 msec <Reads average |
7.2 |
EqualLogic Average Disk Write Latency |
<20 msec |
2.0 |
Database\Database Page Fault Stalls/sec |
0 |
0 |
MSExchange Database\IO Log Writes Average Latency |
<20 msec |
5.0 |
Database\Log Record Stalls/sec |
0 |
0 |
MSExchange Database\I/O Database Reads (Recovery) Average Latency |
<200 msec |
Not applicable |
MSExchange Database\I/O Database Writes (Recovery) Average Latency |
<200 msec |
Not applicable |
MSExchange Database\IO Log Read Average Latency |
<200 msec |
Not applicable |
Application Health
Exchange is healthy, and all of the counters used to determine application health are well under target values.
Counter | Target | Tested result |
---|---|---|
MSExchangeIS\RPC Requests |
<70 |
7.8 |
MSExchangeIS\RPC Averaged Latency |
<10 msec |
3.5 |
MSExchangeIS Mailbox(_Total)\Messages Queued for Submission |
0 |
3.0 |
MSExchange Replication(*)\CopyQueueLength |
<1 |
Not applicable |
MSExchange Replication(*)\ReplayQueueLength |
<5 |
Not applicable |
Validation of Client Access and Hub Transport Servers
Processor
Processor utilization is low as expected.
Counter | Target | Tested result |
---|---|---|
Hyper-V Hypervisor Virtual Processor\% Guest Run Time |
<70% |
25 |
Storage
The storage results look good. The very low latencies should have no impact on message transport.
Counter | Target | Tested result |
---|---|---|
Logical/Physical Disk(*)\Avg. Disk sec/Read |
<20 msec |
0.003 |
Logical/Physical Disk(*)\Avg. Disk sec/Write |
<20 msec |
0.001 |
Application Health
The low RPC Averaged Latency values confirm a healthy Client Access server with no impact on client experience.
Counter | Target | Tested result |
---|---|---|
MSExchange RpcClientAccess\RPC Averaged Latency |
<250 msec |
13.0 |
MSExchange RpcClientAccess\RPC Requests |
<40 |
5.9 |
The Transport Queue counters are all well under target, confirming that the Hub Transport server is healthy and able to process and deliver the required messages.
Counter | Target | Tested result |
---|---|---|
\MSExchangeTransport Queues(_total)\Aggregate Delivery Queue Length (All Queues) |
<3000 |
4.2 |
\MSExchangeTransport Queues(_total)\Active Remote Delivery Queue Length |
<250 |
0 |
\MSExchangeTransport Queues(_total)\Active Mailbox Delivery Queue Length |
<250 |
3.4 |
\MSExchangeTransport Queues(_total)\Submission Queue Length |
<100 |
0 |
\MSExchangeTransport Queues(_total)\Retry Mailbox Delivery Queue Length |
<100 |
0.6 |
Validation of Root Server Health
Processor
As expected, the processor utilization is very low and well under target thresholds.
Counter | Target | Tested result |
---|---|---|
Hyper-V Hypervisor Logical Processor(_total)\% Guest Run Time |
<75% |
47.5 |
Hyper-V Hypervisor Logical Processor(_total)\% Hypervisor Run Time |
<5% |
1.2 |
Hyper-V Hypervisor Logical Processor(_total)\% Total Run Time |
<80% |
48.7 |
Hyper-V Hypervisor Root Virtual Processor(_total)\% Guest Run Time |
<5% |
3.5 |
Application Health
The VM health summary counters indicate that all VMs are in a healthy state.
Counter | Target | Tested result |
---|---|---|
Hyper-V Virtual Machine Health Summary\Health Critical |
0 |
0 |
Return to top
Conclusion
This white paper provides an example of how to design, test, and validate an Exchange 2010 solution for customer environments with 9,000 mailboxes deployed on Dell server and storage solutions. The step-by-step methodology in this document walks through the important design decision points that help address key challenges while ensuring that core business requirements are met.
Return to top
Additional Information
For the complete Exchange 2010 documentation, see Exchange Server 2010.
For more information from Dell, see the following resources:
Dell Power Solutions Article: Optimizing Microsoft Exchange Server 2010 Deployments on Dell Servers and Storage
Dell Exchange 2010 ROI Study: http://www.dell.com/downloads/global/products/pedge/en/ROI_exchange.pdf
For more information from F5, see the following resources:
F5 Exchange 2010 Deployment Guide: http://www.f5.com/pdf/deployment-guides/f5-exchange-2010-dg.pdf
F5 Exchange Solution Brief: http://www.f5.com/pdf/application-ready-network-guides/microsoft-exchange-2010-arng.pdf
If you need more information, contact the F5 Microsoft Partnership Team at microsoftpartnership@f5.com.
This document is provided "as-is." Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it.
This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes.
Return to top