Exchange Server 2010 Design and Architecture at Microsoft
How Microsoft IT Deployed Exchange Server 2010
Technical White Paper
The following content may no longer reflect Microsoft’s current position or infrastructure. This content should be viewed as reference documentation only, to inform IT business decisions within your own company or organization.
Technical White Paper, 8.44 MB, Microsoft Word file
Products & Technologies
With Exchange Server 2007, Microsoft IT streamlined the messaging environment through server clusters based on Windows Clustering, and a highly available, direct-attached storage technology helped ensure 99.99 percent availability. However, the costs and general limitations associated with the platforms and technologies used in the Exchange Server 2007 environment prevented Microsoft IT from efficiently meeting emerging messaging and business needs.
By replacing all servers running Exchange Server 2007 with servers running Exchange Server 2010, Microsoft IT created new opportunities to reduce costs and system complexities while increasing security and deploying new features not available in previous versions of Exchange Server.
Reasons for Microsoft IT to Upgrade
Environment Before Exchange Server 2010
Planning and Design Process
Assessment and Scoping
Deployment Planning Exercises
Pre-Release Production Deployments
Technology Adoption Program
Architecture and Design Decisions
Administration and Permissions Model
Server Architectures and Designs
Mailbox Server Configuration
Backup and Recovery
Client Access Server Topology
Internet Mail Connectivity
Introducing Exchange Server 2010 into the Corporate Production Environment
Verifying the Successful Integration of Exchange Server 2010
Fully Deploying Client Access Servers in North America
Fully Deploying Hub Transport Servers in North America
Deploying Mailbox Servers in North America
Transitioning Internet e-mail
Deploying Exchange Server 2010 in Regional Data Centers
Planning and Design Best Practices
Server Design Best Practices
Deployment Best Practices
For More Information
Microsoft Information Technology (Microsoft IT) maintains a complex Microsoft® Exchange Server environment that consists of several geographic locations and multiple Active Directory® forests. There are 16 data centers, four of which host Exchange servers, to support more than 515 office locations in 102 countries with more than 180,000 users. These users include managers, employees, contractors, business partners, and vendors. Microsoft IT transitioned this environment to Exchange Server 2010 in less than seven months by taking advantage of its growing automation infrastructure and the enhanced deployment features available in Microsoft Exchange Server 2010, in combination with proven planning, design, and deployment methodologies.
Before an Exchange Server release can ship, it has to be thoroughly tested in the production environment. The deployment of Exchange Server into the corporate environment is quicker with each release. For Exchange Server 2010, production testing began in February 2009, one year before Exchange 2010 was available. The entire company migrated to a release candidate (RC)—several months before release to manufacturing (RTM) occurred in September 2009. Microsoft IT accomplished this despite the challenge of testing the Windows® 7 operating system and Microsoft Office 2010 at the same time.
At Microsoft, Microsoft IT and the Exchange Server product group work together closely. Microsoft IT must sign off on a release before the product group can ship it to customers. This relationship is critical to identifying show-stopping factors during the release process.
This technical white paper discusses the Exchange Server 2010 architecture, design, and technologies that Microsoft IT chose for the corporate environment. This paper also discusses the strategies, procedures, successes, and practical experiences that Microsoft IT gained during the planning and design phase. Common planning and design tasks for many Exchange Server deployment projects include server design, high-availability implementation, and capacity planning. In addition to these tasks, transitioning a complex messaging environment to run on Exchange Server 2010 entails specific planning considerations regarding directory integration, routing topology, Internet connectivity, client access technologies, and unified messaging (UM).
The most important benefits that Microsoft IT achieved with the production rollout of Exchange Server 2010 included:
A reduction in input/output per second (IOPS) of 70 percent since Microsoft Exchange Server 2007. The database optimizations of Exchange 2010 provide better performance and reduced storage costs. This results in a savings of more than 50 percent in the total cost of ownership (TCO) of storage.
An increased Mailbox size of 5 GB for all mailboxes in the organization.
Increased mailbox migration velocity over Exchange Server 2007, which enabled Microsoft IT to migrate the entire company much more quickly.
Elimination of backups, which saves millions of dollars per year.
This paper contains information for business and technical decision makers who are planning to deploy Exchange Server 2010. This paper assumes that the audience is already familiar with the concepts of the Windows Server® 2008 operating system, Active Directory Domain Services (AD DS), and previous versions of Exchange Server. A high-level understanding of the new features and technologies included in Exchange Server 2010 is also helpful. Detailed product information is available in the Exchange Server 2010 Technical Library at https://technet.microsoft.com/en-us/library/bb124558.aspx.
Note: For security reasons, the sample names of forests, domains, internal resources, organizations, and internally developed security file names used in this paper do not represent real resource names used within Microsoft and are for illustration purposes only.
From the earliest days, e-mail messaging has been an important communication tool for Microsoft. Microsoft established the first company-wide messaging environment in July 1982 based on Microsoft XENIX (a UNIX version for the Intel 8088 platform). This environment evolved over more than a decade into a large and distributed infrastructure that was increasingly difficult to manage. By migrating to Microsoft Exchange Server version 4.0 in 1996, and subsequently upgrading to Microsoft Exchange Server version 5.0 and Microsoft Exchange Server version 5.5 in 1997, Microsoft IT achieved significant improvements in terms of functionality, maintainability, and reliability.
At the beginning of the new millennium, Microsoft IT operated a messaging environment that included approximately 200 Exchange Server 5.5–based servers in more than 100 server locations with approximately 59,000 users. The changes continued with the upgrade to Microsoft Exchange 2000 Server, released in October 2000. Exchange 2000 Server so tightly integrated with the TCP/IP infrastructure, Windows, and Active Directory that Microsoft IT no longer could manage the messaging environment as an isolated infrastructure.
The shift of Microsoft IT toward a service-focused IT organization was also noticeable in the designs and service level agreements (SLAs) that Microsoft IT established with the rollout of Exchange Server 2003, released in October 2003. Microsoft IT designed the Exchange Server 2003 environment for the scalability and availability requirements of a fast-growing company. The consolidation included upgrades to the network infrastructure and the deployment of large Mailbox servers to support 85,000 mailboxes in total. Business-driven SLAs demanded 99.99 percent availability, including both unplanned outages and planned downtime for maintenance, patching, and so forth. To comply with the SLAs, Microsoft IT deployed almost 100 percent of the Mailbox servers in server clusters by using Windows Clustering and a highly available, shared storage subsystem based on storage area network (SAN) technology.
In 2006, before Exchange Server 2007, the environment had grown to include 130,000 mailboxes that handled 6 million internal messages and received 1 million legitimate message submissions from the Internet daily. On average, every user sent and received approximately 100 messages daily, amounting to an average e-mail volume per user per day of 25 megabytes (MB). As the demand for greater mailbox limits increased, new technologies and cost-efficient storage solutions such as direct-attached storage (DAS) were necessary to increase the level of messaging services in the corporate environment.
Shortly after Exchange Server 2007 was implemented in production, Microsoft IT began testing early versions of Exchange Server 2010 in the Dogfood forest. This testing started in November 2008 and included evaluations of the feature sets built into the product as well as an evaluation of the best methods to perform a migration. As the product neared release, Microsoft IT moved from testing in the Dogfood forest to the beginning of its production migration. The production migration started in February 2009 and lasted through September 2009, at which point all mailboxes in production were running on Exchange Server 2010. Exchange Server 2010 officially launched on November 9, 2009.
Reasons for Microsoft IT to Upgrade
Microsoft IT has moved to every version of Exchange Server with increasing speed and has taken advantage of its access to the product group. Microsoft IT has also participated in providing valuable feedback to the product group. Microsoft IT continually upgrades to the latest version of Exchange Server not only to fulfill its mission as the IT department at Microsoft, but also to achieve Microsoft business goals that are similar to the goals of many other organizations that use e-mail systems.
Microsoft deals with the same business challenges as most other large multinational companies. It has many remote workers, growing demands to cut costs in the current economic environment, and a need to provide increased efficiencies through technology.
Microsoft IT used the deployment of Exchange Server 2010 as an opportunity to focus on implementing solutions that help improve productivity and competitiveness, streamline system administration, and increase messaging protection beyond the levels already possible with Exchange Server 2007.
For the deployment of Exchange Server 2010, Microsoft IT defined the following key objectives:
Increase employee productivity This included increasing mailbox quotas to 5 gigabytes (GB). Increasing employee productivity also included testing the deployment of Microsoft Outlook® 2010 as the primary messaging client so that users can benefit from new and advanced information management features such as mail tips.
Increase operational efficiency This included reducing operational overhead associated with maintaining the messaging environment through features that are directly available in Exchange Server 2010, such as Windows PowerShell™ command-line interface remoting in conjunction with the Exchange Management Shell and the Exchange Control Panel. Windows PowerShell remoting enables administrators to use the Exchange cmdlets without requiring administrators to have the Exchange management tools installed. The Exchange Control Panel enables administrators to manage recipient-level tasks such as adding users to distribution lists without requiring any tool but a Web browser.
Decrease costs This included redesigning server architectures and backup solutions for high availability to meet challenging SLAs. In redesigning server architectures, Microsoft IT heavily focused on incorporating the features directly available in Exchange Server 2010, replacing continuous cluster replication (CCR)-based Mailbox server clusters with the new database availability groups (DAGs), and eliminating backups altogether. All of these considerations resulted in significant cost savings.
Environment Before Exchange Server 2010
The Microsoft IT deployment options and design decisions for the transition to Exchange Server 2010 depended heavily on the characteristics of the existing network, AD DS, and the messaging environment. Among other considerations, it was important to perform the transition from Exchange Server 2007 to Exchange Server 2010 without service interruptions or data loss. With the additional migration features, such as Online Mailbox Moves, introduced in Exchange Server 2010, Microsoft IT was able to do this with minimal effort. To understand the Microsoft IT design decisions in detail, it is necessary to review the environment in which Microsoft IT performed the transition.
Figure 1 illustrates the locations of the data centers that contain Mailbox servers in the corporate production environment. Concerning the WAN backbone, it is important to note Microsoft IT deliberately designed the network links to exceed capacity requirements. Only 10 percent of the theoretically available network bandwidth is dedicated to messaging traffic. The vast majority of the bandwidth is for non-messaging purposes to support the Microsoft product development groups.
Figure 1. Microsoft data centers that contain Mailbox servers
Note: In addition to the data centers shown in Figure 1, there are additional sites where Exchange servers exist to support security audits and testing of specific features. These locations do not contain mailboxes and do not participate in the day-to-day use of the messaging environment and are therefore excluded from the scope of this paper.
Each Microsoft data center is responsible for a region defined along geographical boundaries. Within each region, network connectivity between offices and the data center varies widely. For example, a high-speed metropolitan area network (MAN) based on Gigabit Ethernet and Synchronous Optical Network (SONET) connects more than 70 buildings to the Redmond data center. These are the office buildings on and off the main campus in the Puget Sound area. In other regions, such as Asia and the South Pacific, Internet-connected offices (ICOs) are more dominant. Broadband connectivity solutions, such as digital subscriber line (DSL) or cable modems, provide significant cost savings over leased lines as long as the decrease in performance and maintainability is acceptable. Microsoft IT uses this type of connectivity primarily for regional sales and marketing offices.
Figure 2 summarizes the typical regional connectivity scenarios at Microsoft. It is important to note that no Mailbox servers exist outside the data center, whereas Active Directory domain controllers may exist in high-availability buildings and medium-sized offices for local handling of user authentication, authorization, and application requests.
Figure 2. Regional connectivity scenarios at Microsoft
For regional connectivity, Microsoft IT relies on a mix of Internet-based and privately owned/leased connections, as follows:
Regional data centers and main campus The main campus and regional data centers are connected together in a privately owned WAN based on frame relay, asynchronous transfer mode (ATM), clear channel ATM links, and SONET links.
Office buildings with standard or high availability requirements Office buildings connect to regional data centers over fiber-optic network links with up to eight wavelength division multiplexing (WDM) channels per fiber pair.
Regional offices with up to 150 employees Regional offices use a persistent broadband connection or leased line to a local Internet service provider (ISP) and then access their regional data centers through a transparent virtual private network over Internet Protocol security (VPN/IPsec) tunnels.
Mobile users These use a dial-up or non-persistent broadband connection to a local ISP, and then access their mailboxes through VPN/IPsec tunnels, or by using Microsoft Exchange ActiveSync®, remote procedure call (RPC) over Hypertext Transfer Protocol (RPC over HTTP, also known as Outlook Anywhere), or Microsoft Outlook Web Access over HTTP Secure (HTTPS) connections.
Like many IT organizations that must keep business units strictly separated for legal or other reasons, Microsoft IT has implemented an AD DS environment with multiple forests. Each forest provides a clear separation of resources and establishes strict boundaries for effective security isolation. At Microsoft, some forests exist for legal reasons, others correspond to business divisions within the company, and yet others are dedicated to product development groups for early pre-release deployments and testing without interfering with the rest of the corporate environment. For example, by maintaining separate product development forests, Microsoft IT can prevent uncontrolled AD DS schema changes in the Corporate forest.
The most important forests at Microsoft have the following purposes:
Corporate Seventy percent of the resources used at Microsoft reside in this forest. The Corporate forest includes approximately 155 domain controllers. The AD DS database is 35 GB in size, with more than 180,000 user objects in the directory.
Corporate Staging Microsoft IT uses this forest to stage software images, gather performance metrics, and create deployment documentation.
Exchange Development Microsoft uses this forest for running pre-release Exchange Server versions in a limited production environment. Users within this forest use beta or pre-beta versions in their daily work to help identify issues before the release of the product. Microsoft IT manages and monitors this forest, while the Exchange Server development group hosts the mailboxes in this forest to validate productivity scenarios.
Extranet Microsoft IT has implemented this forest to provide business partners with access to corporate resources. There are approximately 30,000 user accounts in this forest.
MSN® MSN is an online content provider through Internet portals such as MSN.com. Microsoft IT manages this forest jointly with the MSN technology team.
MSNBC MSNBC is a news service and a joint venture between Microsoft and NBC Universal News. Legal reasons require Microsoft to maintain a separate forest for MSNBC. Microsoft IT manages this forest jointly with the MSNBC technology team.
Test Extranet This forest enables the Extranet Technology team to test new solutions for partner collaboration without interfering with the Extranet forest. Microsoft IT manages this forest jointly with the Extranet Technology team.
Windows Deployment Microsoft IT created this forest to launch pilot projects during the Windows Server 2003 deployment phase as a pre-staging environment before deployment and feature configuration in the Corporate forest. It is a limited production forest. Users within this forest use beta or pre-beta software in their daily work to help product development groups identify and eliminate design flaws and other issues.
Windows Legacy Microsoft IT uses this forest as a test environment for compatibility testing of previous Windows Server versions with Exchange Server.
Note: Microsoft IT maintains a common global address list (GAL) across all relevant forests that contain Exchange Server organizations by using AD DS GAL management agents, available in Microsoft Identity Lifecycle Manager 2007.
Domains in the Corporate Forest
Microsoft IT implemented nine domains in the Corporate forest, separated into geographic regions. At the time of the production rollout, all domains in the Corporate forest operated at the Windows Server 2008 functional level and contained between 7 and 30 domain controllers. The domain controllers are 64-bit multi-processor systems with 16 GB of random access memory (RAM).
Note: Microsoft IT does not use domains to decentralize user account administration. The human resources (HR) department centrally manages the user accounts, including e-mail address information, in a separate line-of-business (LOB) application. The HR system provides advanced business logic not readily available in AD DS to enforce consistency and compliance. It is the authoritative source of user account information, synchronized with AD DS through Identity Lifecycle Manager 2007.
Active Directory Sites
Overall, the corporate production environment (that is, the Corporate forest) includes 266 Active Directory sites in a hub-and-spoke topology that closely mirrors the network infrastructure. The authoritative source of IP address and subnet information necessary for the Active Directory site definitions is an infrastructure database that the Microsoft IT network team maintains. By using Identity Lifecycle Manager 2007, Microsoft IT provisions site and subnet objects in AD DS based on the data from the IP address and subnet infrastructure database and helps ensure an accurate Active Directory site topology that mirrors the network layout. The Identity Lifecycle Manager solution automatically calculates all site links during the import into AD DS. Based on this information, Knowledge Consistency Checker updates the replication topology for the forest. Microsoft IT does not maintain the AD DS replication topology manually.
Site Topology and Exchange Server 2007
Exchange Server 2007 uses the Active Directory site design to perform e-mail routing. With this in mind, Microsoft IT built the messaging environment to take advantage of the network infrastructure in addition to building an Exchange Server 2007-only site in the Redmond area. This separation in one location has enabled Microsoft to configure communication between the primary Exchange site and the other sites with custom site link values. Other advantages to this site topology are related to how Microsoft IT manages AD DS.
Four sites contain Mailbox servers. The remaining Active Directory sites in the environment contain infrastructure servers, domain controllers to handle authentication requests from client workstations, and LOB applications, but no Exchange servers that are related to the production implementation of Exchange Server.
Note: The Active Directory site topology at Microsoft mirrors the network layout of the corporate production environment, with ADSITE_REDMOND as the hub site in a hub-and-spoke arrangement of sites and site links.
Dedicated Exchange Site Design
The Active Directory site named ADSITE_REDMOND-EXCHANGE contains only Exchange servers and domain controllers configured as global catalog servers. Microsoft created this dedicated site design during the Exchange 2000 Server time frame to provide its Exchange 2000-based and Exchange Server 2003-based servers with exclusive access to highly available AD DS servers, shielded from client authentication and other application traffic.
Microsoft IT continues to use the dedicated Exchange site in its Exchange Server 2007 and Exchange Server 2010 environment for the following reasons:
Performance assessments Exclusive AD DS servers provide an opportunity to gather targeted performance data. Based on this data, Microsoft IT and developers can assess the impact of Exchange Server versions and service packs on domain controllers in a genuine large-scale production environment.
Early release of Windows Server domain controllers This design continues through the Exchange Server 2010 deployment to help ensure that future changes to domain controllers in production will not negatively affect the messaging system.
Warning: Implementing dedicated Active Directory sites for Exchange Server increases the complexity of the directory replication topology and the required number of domain controllers in the environment. To maximize the return on investment (ROI), customers should weigh the business and technical needs of their own environment when considering dedicated Active Directory sites.
The design of an Exchange Server 2007 organization relies on the Active Directory site structure, with the availability of configuration settings that enable the administrator to adjust site connector costs to optimize traffic flow for messaging.
As depicted in Figure 3, at the end of the Exchange Server 2007 migration, four primary sites held Exchange servers: Redmond, Dublin, Singapore, and Sao Paulo. These sites held the core messaging components that included Mailbox servers configured in CCR cluster configurations, Public Folder servers, Client Access servers, Hub Transport servers, and UM servers. In addition, Microsoft IT installed Edge servers in several perimeter network locations to provide incoming and outgoing mail transfers. Backups existed in each environment where Exchange Mailbox servers were installed, and Microsoft IT maintained an associated backup infrastructure in each of these locations.
Figure 3. Exchange Server 2007 Environment
One major change that occurred between the implementation of Exchange Server 2007 and the beginning of the Exchange Server 2010 environment was the introduction of Microsoft Forefront® Online Protection for Exchange. Microsoft IT introduced this offering into the stream of mail flow for all e-mail at Microsoft to both test the validity of the offering that customers were using and to reduce the number of servers and support costs of the messaging environment. Due to this change, Microsoft IT removed all of the perimeter network installations of Exchange Server and effectively decommissioned those sites. Moving into Exchange Server 2010, this integration with Forefront Online Protection for Exchange continued and provided a more seamless transition of mail flow.
Planning and Design Process
The Microsoft IT planning and design process is unique in the way that messaging engineers start their work early in the product development cycle and collaborate very closely with the Exchange Server product group to clarify how exactly the new Exchange Server version should address concrete business requirements, system requirements, operational requirements, and user requirements. Through an assessment of the Exchange Server 2010 environment and in discussions with partner and customer IT organizations, Microsoft IT identified general issues, such as scalability to support large mailboxes, a 99.99 percent availability goal, high storage costs, and high backup costs.
Figure 4 illustrates how Microsoft IT aligned the Exchange Server 2010 design and deployment processes from assessment and scoping through full production rollout. The individual activities correspond to the phases and milestones outlined in the Microsoft Solutions Framework (MSF) Process Model.
Figure 4. Microsoft Solution Framework Process Model
The next sections discuss key activities that helped Microsoft IT determine an optimal Exchange Server 2010 architecture and design for the corporate production environment.
Assessment and Scoping
In extensive planning sessions, program managers, service managers, the Exchange Systems Management team, Tier 2 Support team, Helpdesk, and Messaging Engineering team collaborated in virtual project teams to identify business and technical requirements and translated these requirements into proposals to the Exchange Server product group. The Exchange Server product group reviewed and incorporated these proposals into its product development plans. The results were commitments and shared goals between the developers and Microsoft IT to drive deployment actions and investments intended to improve IT services.
Deployment Planning Exercises
Within Microsoft IT, the Messaging Engineering team is responsible for creating the architectures and designs of all Exchange-related technologies. At a stage when the actual product was not available, messaging engineers began their work with planning exercises based on product development plans. The objective of these exercises was to decide how to deploy the new Exchange Server version in the future.
The messaging engineers based their design decisions on specific productivity scenarios, the scalability and availability needs of the company, and other requirements defined during the assessment and scoping phase. For example, Microsoft IT decided to use DAGs to eliminate single points of failure in the Mailbox server configuration and Just a Bunch of Disks (JBOD) to reduce storage costs while at the same time increasing mailbox quotas up to 5 GB through thin provisioning. The deployment planning exercises helped to identify required hardware and storage technologies that Microsoft IT needed to invest in to achieve the desired improvements.
One of the major components of those technologies was a shift to JBOD. JBOD is a disk technology that does not take advantage of disk-based redundancy, like redundant array of independent disks (RAID), to help protect the data stored on disk. In past versions of Exchange Server, as in most enterprise applications today, JBOD was not an option. This is due to the limited nature of the software to offer high-availability options and to be architected in a manner that is resilient to storage errors. With the introduction of Exchange Server 2010, Microsoft has addressed these problems directly. The two investments that the product group made to support JBOD were additional improvements to the input/output (I/O) profile of the data and new improvements in the resiliency options.
The IOPS requirement for the Exchange database decreased by 70 percent beyond the IOPS required in Exchange Server 2007. This is a significant achievement, considering that Exchange Server 2007 was a 70 percent reduction from the requirements of Exchange Server 2003. In addition to the reduction in IOPS, the I/O load was smoothed out to be uniform. This is due to changes in the internal database architecture.
The resiliency was first enhanced with the ability to have up to 16 copies of a database replicated to different Mailbox servers. With the potential for so many copies in any implementation, the Exchange Server product group implemented a feature that would make the mounted copy of the database more resilient to errors on the disk. Instead of failing the database over any time an error on the disk is discovered, the Exchange Server 2010 database can perform an automatic page restore. This process identifies what portion of the database is inaccessible because of the disk errors and copies that page or pages from one of the other replicated copies of the database to an unused portion of the disk. This process allows the disks to have a higher occurrence of errors than the software would otherwise tolerate before failing over to a secondary copy of the data.
These changes, along with many other enhancements to the database, support a fail-safe design by using extremely low-cost storage for a highly critical enterprise application like Exchange Server 2010.
Note: The number of changes to the database in Exchange Server 2010 is far greater than the scope of this paper. For a full analysis of the changes to the database, see the topic “New Exchange Core Store Functionality” at https://technet.microsoft.com/en-us/library/bb125040.aspx.
The Messaging Engineering team maintains a lab environment that simulates the corporate production environment in terms of technologies, topology, product versions, and interoperability scenarios, but without production users. The engineering lab includes examples of the same hardware and storage platforms that Microsoft IT uses in the corporate production environment. It provides the analysis and testing ground for the messaging engineers to validate, benchmark, and optimize designs for specific scenarios; test components; verify deployment approaches; and estimate operational readiness.
Testing in the engineering lab helps the messaging engineers ensure that the conceptual and functional designs scale to the requirements and scope of the deployment project. For example, code instabilities or missing features of beta products might require Microsoft IT to alter designs and integration plans. Messaging engineers can verify the capabilities of chosen platforms and work with the product groups and hardware vendors to make sure that the deployed systems function as expected, even when running pre-release versions.
Pre-Release Production Deployments
Microsoft IT maintains a pre-release infrastructure, which is a limited production environment for running pre-release versions of server products. Pre-release production deployments begin before the alpha stage and continue through the beta and RC stages until Microsoft releases the product to manufacturing. During the pre-alpha stage, pre-release production deployments are a developer effort. Additional Microsoft employees join the campaign during the beta stages as early adopters.
Pre-release production deployments enable the developers to determine the enterprise readiness of the software, identify issues that might otherwise not be found before RTM, and collect valuable user feedback. For example, Exchange Server 2010 pre-release verification started in November 2008, one year before the Exchange Server product group shipped the product. Exchange Server 2010 set an unprecedented benchmark, with the entire company migrated to the release candidate several months before Exchange Server 2010 was released to manufacturing (September 2009).
Technology Adoption Program
The Exchange Server 2010 Technology Adoption Program (TAP) started in January 2008. TAP is a special Microsoft initiative, available by invitation only, to obtain real-world feedback on Microsoft pre-release products from partners and customers. More than 40 Microsoft partners and customers participated in the Exchange Server 2010 TAP, resulting in more than 200,000 mailboxes running on pre-release software in customer environments. The Messaging Engineering team was also actively involved by providing early adopters with presentations that outlined the Microsoft IT design process based on the then-current state of the product.
Microsoft runs several types of TAP programs. For more information, see the TAP early-adopter information on the Exchange Server group's blog entry at https://msexchangeteam.com/archive/2004/12/29/343848.aspx.
An important task of the Messaging Engineering team is to document all designs, which the messaging engineers pass as specifications to the technical leads in the Systems Management team for acceptance and implementation. The messaging engineers also assist the technical leads during pilot projects and server installations and help develop a set of build documents and checklists to give operators detailed deployment instructions for the full-scale production rollout.
With this in mind, the Messaging Engineering team learns about the various features of the products and compares them against the business and technical needs of the company. All features that the product group adds to the product can therefore be validated by a separate group to ensure that they are usable. During the Exchange Server 2010 development cycle, engineers evaluated every feature in preparation for making future recommendations on architecture and design. Some of the features met the design goals and needs of the company and were integrated into the architecture, whereas other features did not meet the needs of the company and were not implemented.
A good example of a feature that was not integrated into the architecture is distribution list moderation. Although this feature has many uses, it did not readily fit into the existing use of distribution lists and culture at Microsoft. It was important for the Messaging Engineering team to determine how the feature functioned and whether it would enhance any of the existing use cases or design goals. This evaluation also helps ensure that future implementation of the feature will be less time consuming.
A good example of a feature that was integrated into the architecture is single item recovery. As part of the move to a backup-free environment, single item recovery was a key component to the overall architecture and required extensive testing in the engineering lab and in pilot environments.
Note: Although many organizations are similar, business needs can vary. Every organization must evaluate product features against its own business and technical needs.
The server designs that the Messaging Engineering team creates include detailed hardware specifications for each server type, which the Infrastructure Management team and the Data Center Operations team at Microsoft IT use to coordinate the procurement and installation of server hardware in the data centers. The Data Center Operations team builds the servers, installs the operating systems, and joins the new servers to the appropriate forest before the Exchange Systems Management team takes over for the deployment of Exchange Server 2010 and related components. To achieve a rapid deployment, Microsoft IT automated most of the Exchange Server deployment steps by using Exchange Management Shell scripts.
Architecture and Design Decisions
Administration and Permissions Model
Role Based Access Control
Microsoft Exchange Server 2010 introduced a new concept for administrator-level permissions. This new permission model, named Role Based Access Control (RBAC), is based on the concept that all of the features and functionality in Exchange Server should have a way to require permissions separately. Usually, this detailed level of permissions is cumbersome and difficult to manage. To combat this problem, Exchange Server 2010 combines common permissions into standard roles. Implementers of Exchange Server 2010 can customize which permissions are included in which roles and create new roles that combine a series of permissions.
The most important consideration of RBAC is that it does not affect Exchange Server 2003 or Exchange Server 2007 permission models, and these need to coexist during the migration. Microsoft IT used universal security groups to enable cross-forest administration and placed administrators in these groups for use in both the Exchange Server 2007 permission model and the Exchange Server 2010 permission model. This allowed for an easier transition.
After considering the groups that administrators were members of, Microsoft IT had to consider the components of RBAC. These components are depicted in Figure 5 and are defined as follows:
Management roleA management role defines the cmdlets and parameters that are available to a person or group assigned to the role. Granting rights to manage or view objects associated with a cmdlet or parameter requires adding a management role for that cmdlet or parameter to the role.
Management role entryA management role entry is a specific assignment of a single cmdlet or parameter to a management role. The accumulations of all the management role entries define the set of cmdlets and parameters that the management role can run.
Management role assignmentA management role assignment is the assignment of a management role to a user or a universal security group. After a management role is created, it is not usable until the role is assigned to a user or group.
Management role scopeA management role scope is the scope of influence or impact of the person or group assigned a management role. In assigning a management role, management scopes can be used to target which servers, organizational units, filters, etc., that the role applies to.
Management role group A management role group is a special security group that contains mailboxes that are members of the role group. Users can be added and removed from the group membership to gain the assignments that have been given to the group. The combination of all the roles on a role group defines everything that users added to a role group can manage in the Exchange organization.
Figure 5. RBAC architecture
To determine which roles to use, Microsoft IT conducted a planning exercise to map the existing team functionality and groups with the roles that are available out of the box in Exchange Server 2010. After creating a table and assigning the various groups to the roles, Microsoft IT determined that it would be able to use the out-of-the-box roles, groups, and assignments for its implementation, with a few exceptions and modifications to the built-in role group memberships. The following exceptions were not transferred back into the product because the product group deemed them specific to the operations of the IT environment at Microsoft:
Helpdesk The original idea was to use the built-in View-Only role. However, Microsoft IT determined that this role would provide too much insight into the environment. Microsoft IT decided instead to create a custom role that allows only a finite number of cmdlets. The concern about the View-Only role was that almost all aspects of the environment could be viewed. This might have confused staff that was not familiar with Exchange and may have disclosed some confidential or sensitive data (for example, journal and transport rules).
Executive support staffThis team requires nearly unlimited rights on the mailboxes for executives. In the existing Exchange Server 2007 model, this team is granted both Exchange Organization Administrator rights and elevated AD DS rights. They therefore can get access to any mailbox in the organization. Having a custom Exchange Server 2010 role provides a way to grant the necessary tasks to a limited scope (the executives).
Exchange Control Panel modificationsThe out-of-the-box rights were too broad for Microsoft IT's deployment. To conform Exchange Control Panel access to internal security requirements, Microsoft IT restricted access to the Exchange Control Panel by using RBAC.
In addition to a new permission model, Exchange Server 2010 introduced a feature called Windows PowerShell remoting. This feature enables Windows PowerShell sessions to access both local and remote servers through a common interactive environment. With Exchange Server 2010, Windows PowerShell remoting sessions are the only method available for administering Exchange Server; therefore, adoption is mandatory. It is difficult to tell the difference between a local Windows PowerShell session and a remote Windows PowerShell session, easing the transition for administrative staff.
There are many advantages to using a uniform management tool. The biggest advantage to Microsoft IT is that it does not require special installation and configuration on all workstations or servers that will administratively interact with Exchange Server. For Microsoft IT, this is important because of its goal to automate as much of the operations and maintenance of Exchange Server as possible. Many scripts that are used on a regular basis do not require a lot of input to ensure that the system is running. With Windows PowerShell remoting, these scripts can run on any system where Windows PowerShell 2.0 has been installed. This is especially important during testing of beta code because of the rapidly changing nature of the administrative tools, and it has contributed to the cost savings that Microsoft IT has achieved through its implementation of Exchange Server 2010.
Agile companies that heavily rely on e-mail in practically all areas of the business, such as Microsoft, cannot tolerate messages that take hours to reach their final destinations. Information must travel fast, reliably, predictably, and in accordance with security guidelines. For Microsoft, this means “90 seconds, 99 percent of the time”—that is, 99 percent of all messages in the corporate production environment must reach their final destination in 90 seconds worldwide. This SLA does not apply to messages that leave the Microsoft environment, because Microsoft IT cannot guarantee message delivery across external systems.
Microsoft established this mail-delivery SLA during the Exchange Server 2003 time frame, and it was equally important during the design of Exchange Server 2010. Another important design goal was to increase security in the messaging backbone by means of access restrictions to messaging connectors, data encryption through Transport Layer Security (TLS), and messaging antivirus protection based on Microsoft Forefront Security for Exchange Server.
During the Exchange Server 2007 implementation, the topology and mail flow were built on the updated messaging architecture. This architecture is still sufficient for the needs of the organization, with the exception of some changes to the Internet-facing mail flow described in the "Internet Mail Connectivity" section later in this paper.
Message routing at Microsoft consists of internal mail routing and mail routing to and from the Internet. All mail routing within the Microsoft environment happens across the Hub Transport servers. All mail routing to and from the Internet happens across the Hub Transport servers in Redmond only and the Forefront Online Protection for Exchange environment. All of these connections are encrypted through TLS, including the message relays to business partners. That leaves a number of new Information Rights Management (IRM) features to consider in the design, such as IRM Search, and integration with Active Directory Rights Management Services (AD RMS).
Information Rights Management
IRM at Microsoft is an important topic. Features such as requiring all messages to have AD RMS applied will affect the company in such a major way that they were postponed until after the initial rollout of Exchange Server 2010.
The features that Microsoft IT turned on by default or configured to work during the initial implementation included the following:
Pre-LicensingMicrosoft IT has had this feature fully deployed since Exchange Server 2007 and continues to enable offline access to AD RMS-secured mail by implementing this feature.
IRM in Outlook Web AppIRM in Outlook Web App enables any Outlook Web App user to view, compose, and reply to IRM-secured messages from within Outlook Web App without any add-ins. With this feature, remote workers can access IRM-secured messages on all supported Web browsers and platforms.
Search Enabled Outlook Web App Search enables content indexing of IRM-secured messages at Exchange Server so that mailbox searching tools in Outlook Web App can scan through both the content and the context of IRM-secured messages. This feature specifically enables employees to search through the IRM-secured messages in their inbox just as they do with unsecured messages.
Dedicated Exchange Sites in the AD DS Topology
During the implementation of Exchange Server 2007, Microsoft IT defined an internal routing architecture that consists of one Exchange-specific Active Directory site in Redmond and uses the existing Active Directory sites in the other three locations.
In the AD DS replication topology, this dedicated Exchange site is a tail site. ADSITE_REDMOND is the Active Directory hub site that is used as the AD DS replication focal point, yet this site does not contain any Hub Transport servers. Accordingly, at this point, Exchange Server 2010 cannot use ADSITE_REDMOND as a hub site for message routing purposes. By default, Exchange Server 2010 interprets the Microsoft IT Exchange organization as a full-mesh topology where Exchange Server 2010 Hub servers in each region connect to each other via a single Simple Mail Transfer Protocol (SMTP) hop. This does not match the AD DS replication and network topology. Exchange Server 2010 uses the full-mesh topology in this concrete scenario, because along the IP site links, all message transfer paths appear to be direct.
Figure 6 illustrates this situation by overlaying the Active Directory site topology and the default message routing topology.
Figure 6. Full-mesh message routing in a hub-and-spoke network topology
The routing topology depicted in Figure 6 works because Exchange Server 2010 can transfer messages directly between the sites that have Hub Transport servers (such as ADSITE_SINGAPORE to ADSITE_DUBLIN), yet all messages travel through the Redmond location according to the physical network layout. In this topology, all bifurcation of messages, sent to recipients in multiple sites, occurs at the source sites and not at the latest possible point along the physical path, which would be Redmond.
For example, if a user in Sao Paulo sends a single message to recipients in the sites ADSITE_REDMOND-EXCHANGE, ADSITE_DUBLIN, and ADSITE_SINGAPORE, the source Hub Transport server in Sao Paulo establishes three separate SMTP connections, one SMTP connection to Hub Transport servers in each remote site, to transfer three message copies. Hence, the same message travels three times over the network from Sao Paulo to Redmond.
Microsoft IT could avoid this by eliminating the dedicated Exchange site ADSITE_REDMOND-EXCHANGE and moving all Exchange servers to ADSITE_REDMOND. The Hub Transport servers in ADSITE_REDMOND would then be in the transfer path between ADSITE_SAO PAULO, ADSITE_DUBLIN, and ADSITE_SINGAPORE. Exchange Server 2010 could then delay bifurcation until messages to recipients in multiple sites reach ADSITE_REDMOND. In this situation, the source server would only need to transfer one message copy to Redmond, the message routing topology would follow the physical network layout, and Microsoft IT would not have to take any extra configuration or optimization steps.
Based on such considerations, it is a logical conclusion that the design of an ideal Exchange Server 2010 environment takes the implications of dedicated Active Directory sites into account. On one hand, it is beneficial to keep the message routing topology straightforward and keep the complexities associated with maintaining and troubleshooting message transfer minimal. On the other hand, Microsoft IT had to weigh the benefits of eliminating the dedicated Redmond Exchange site ADSITE_REDMOND-EXCHANGE against the impact of such an undertaking on the overall deployment project in terms of costs, resources, and timelines.
Among other uses, Microsoft IT takes advantage of the dedicated Active Directory site to measure the footprint of Exchange Server 2010 on domain controller/global catalog servers and provide this information as feedback to the product groups. As a result, Microsoft IT decided to leave ADSITE_REDMOND-EXCHANGE in place. Instead, the Exchange Messaging team collaborated with the AD DS team to adjust the Active Directory site topology by using alternative methods to optimize message transfer without affecting the established AD DS replication architecture and topology.
Optimized Message Transfer Between Hub Transport Servers
Although Exchange Server 2010 generated a functioning message routing topology without any extra design work, Microsoft IT decided to review the routing topology based on business and technical requirements to drive further optimizations. Key factors that influenced the optimization decision included the “90 seconds, 99 percent of the time” mail-delivery SLA and the desire to save network bandwidth on WAN links by increasing the efficiency of message transfer.
Important reasons that compelled Microsoft IT to optimize the Exchange Server 2010 message transfer topology include the following:
Efficient message flow Although optimized message flow is not a strict requirement and it is possible to meet mail-delivery SLAs in a full-mesh topology, optimized message flow can help to accelerate message delivery.
Preserved WAN bandwidth The corporate production environment handles more than 15 million internal messages daily. Although most message traffic stays in the local site or has Redmond headquarters as the destination, optimized message flow can help to preserve WAN bandwidth for all messages that have recipients in multiple remote Active Directory sites.
After Microsoft IT made the decision to optimize message routing, it augmented the Active Directory site link topology to take advantage of the Exchange Hub Transport servers in ADSITE_REDMOND-EXCHANGE. To achieve efficient message flow and preserve WAN bandwidth, Microsoft IT needed to place ADSITE_REDMOND-EXCHANGE in the routing path between ADSITE_DUBLIN, ADSITE_SAO PAULO, and ADSITE_SINGAPORE by creating additional Active Directory site IP links.
This approach ensured that Exchange Server 2010 could bifurcate messages traveling between regions closer to their destination. By configuring the ExchangeCost attribute on Active Directory site links, which Exchange Server 2010 adds to the Active Directory site link definition, Microsoft IT was able to perform the message flow optimization without affecting the AD DS replication topology. The ExchangeCost attribute is relevant only for Exchange Server 2007 and Exchange Server 2010 message routing decisions between sites, not AD DS replication.
Microsoft IT performed the following steps to optimize message routing in the corporate production environment:
To establish a hub/spoke topology between all sites that have Exchange servers, Microsoft IT created three additional Active Directory IP site links.
Microsoft IT specified a Cost value of 999 (highest across the AD DS topology) for these new IP site links so that AD DS does not use these site links for directory replication.
By using the Set-AdSiteLink cmdlet, Microsoft IT assigned an ExchangeCost value of 10 to the new Exchange-specific site links. This value is significantly lower than the Cost value of all other Active Directory site links, so that Exchange Server 2010 uses the Exchange-specific site links for message routing path discovery.
Based on the Exchange-specific Active Directory/IP site link topology, Exchange Server 2010 routes messages in the Corporate forest as follows:
Messages to a single destination The source Hub Transport server selects the final destination as the next hop and sends the messages directly to a Hub Transport server in that site. For example, in the Dublin to Singapore mail routing scenario, the network connection passes through Redmond, but the Hub Transport servers in ADSITE_REDMOND-EXCHANGE do not participate in the message transfer.
Messages to an unavailable destination If the source Hub Transport server cannot establish a direct connection to a destination site, the Hub Transport server backs off along the least-cost routing path until it establishes a connection to a Hub Transport server in an Active Directory site. This is a Hub Transport server in ADSITE_REDMOND-EXCHANGE, which queues the messages for transmission to the final destination upon restoration of network connectivity.
Messages to recipients in multiple sites Exchange Server 2010 delays message bifurcation if possible. In the optimized topology, this means that Hub Transport servers first transfer all messages that have recipients in multiple sites to a Hub Transport server in ADSITE_REDMOND-EXCHANGE. The Hub Transport server in ADSITE_REDMOND-EXCHANGE then performs the bifurcation and transfers a separate copy of the message to each destination. Exchange Server 2010 transfers a single message copy from ADSITE_SAO PAULO to ADSITE_REDMOND-EXCHANGE, where bifurcation occurs, before transferring individual message copies to each destination site. Again, Exchange Server 2010 transfers only a single copy per destination site. Within each site, the receiving Hub Transport servers may bifurcate the message further as necessary for delivery to individual recipients.
Note: Microsoft IT did not configure the REDMOND-EXCHANGE site as a hub site in the routing topology by using the Set-AdSite cmdlet to force all messages between regions to travel through the REDMOND-EXCHANGE site, requiring an extra SMTP hop on Hub Transport servers in that site. Microsoft IT found no compelling reason to force all message traffic through the North American Hub Transport servers. Establishing a hub site is useful if tail sites cannot communicate directly with each other. In the Microsoft IT corporate production environment, this is not an issue. For more information about hub site configurations, see the topic “Understanding Message Routing” at https://technet.microsoft.com/en-us/library/aa998825.aspx.
Connectivity to Remote SMTP Domains
For destinations outside the Corporate forest, Microsoft IT distinguishes between external and internal remote locations. For external remote locations, Microsoft IT relays all messages over Internet mail connectors on the Hub Transport servers deployed in Redmond, as explained in the section "Internet Mail Connectivity" later in this white paper. For internal remote locations, Microsoft IT placed specific messaging connectors directly on the Hub Transport servers in ADSITE_REDMOND-EXCHANGE. This design mirrors the Exchange Server 2007 topology.
Increased Message Routing Security
To comply with legal and regulatory requirements, Microsoft IT encrypts most messaging traffic in the corporate production environment. The only exceptions are internal destinations that do not have user mailboxes, such as lab and test environments.
Microsoft IT uses Transport Layer Security (TLS) to prevent unauthorized access to information during message transmission. Exchange Server 2010 supports TLS out of the box. Hub Transport servers use TLS to encrypt all message traffic within the Exchange Server 2010 environment and rely on opportunistic TLS encryption for communication with remote destinations, such as Hub Transpo rt servers in other Microsoft IT-managed forests. Forefront Online Protection for Exchange also supports TLS to help ensure that outgoing messages are secured until they leave the network. Native support for SMTP TLS on Hub Transport servers enabled Microsoft IT to eliminate the dependency on complex IPsec policies for encryption of internal and external messages in transit.
In addition to encrypting messaging traffic internally, Microsoft IT helps protect its internal messaging environment by restricting access to incoming SMTP submission points. This helps Microsoft IT minimize mail spoofing and ensure that unauthorized SMTP mail submissions from rogue internal clients and applications do not affect corporate e-mail communications.
To accomplish this goal, Microsoft IT removes all default receive connectors on the Hub Transport servers and configures custom receive connectors by using the New-ReceiveConnector cmdlet to accept only authenticated SMTP connections from other Hub Transport servers and the Forefront Online Protection for Exchange environment. To meet the needs of internal SMTP applications and clients, Microsoft IT established a separate SMTP gateway infrastructure based on Exchange Server 2010 Hub Transport servers that enforces mail submission access controls, filtering, and other security checks.
Furthermore, Microsoft IT deployed Forefront Security for Exchange Server on all Hub Transport servers to implement messaging protection against viruses. Despite the fact that internal messages and messages from the Internet might pass through multiple Hub Transport servers, performance-intensive antivirus scanning occurs only once. Forefront Security for Exchange Server adds a security-enhanced antivirus header to each scanned message, so further Hub Transport servers do not need to scan the same message a second time. This avoids processing overhead while maintaining an effective level of antivirus protection for all incoming, outgoing, and internal e-mail messages.
Server Architectures and Designs
Smooth operation of the messaging service requires a highly stable Exchange Server 2010 server architecture. The environment needs to handle spikes in usage of the messaging system, needs to provide reliable delivery times if a system outage occurs, and needs to provide flexibility for growth. To accommodate these goals, Microsoft IT developed the following server architectures for the environment.
Exchange Server 2010 supports the five separate server roles below to perform the tasks of an enterprise messaging system:
Client Access servers Support the traditional components such as Post Office Protocol 3 (POP3) and Internet Message Access Protocol 4 (IMAP4), Exchange ActiveSync, Microsoft Outlook Web App, Outlook Anywhere, and several new features, including the RPC Client Access Service and the Exchange Control Panel.
Edge Transport servers Handle message traffic to and from the Internet and run spam filters. Microsoft IT does not use this role.
Hub Transport servers Perform the internal message transfer, distribution list expansions, and message conversions between Internet mail and Exchange Server message formats. At Microsoft, all Hub Transport servers also run Forefront Security for Exchange Server for virus scanning.
Mailbox servers Maintain mailbox store databases, provide Client Access servers with access to the data, and support access to public folders for Outlook clients.
Unified Messaging servers Integrate voice and fax with e-mail messaging and run Outlook Voice Access.
With the exception of the Edge Transport server role, Exchange Server 2010 supports multiple-role server deployments. The Client Access server role, Hub Transport server role, Mailbox server role, and Unified Messaging server role can coexist on the same computer in any combination. Placing several roles on a single computer is advantageous for Exchange Server deployments where service distribution is a goal.
The multiple-role approach provides the benefits of a reduced server footprint and can help minimize the hardware costs. For example, Microsoft IT deploys multiple-role servers in Sao Paulo for the Hub Transport server role, Client Access server role, and Unified Messaging server role to use hardware resources efficiently. Similar to the Exchange Server 2007 design, Microsoft IT consolidated server roles in this location due to moderate workload. However, the Mailbox servers in Sao Paulo are single-role servers to help ensure that the Mailbox role can take advantage of all the resources available.
Microsoft IT based its decisions to combine Exchange server roles on the same hardware or separate them between dedicated servers on capacity, performance, and availability demands. Mailbox servers are prime examples of systems that have high capacity, performance, and availability requirements at Microsoft. Accordingly, Microsoft IT deployed the Mailbox servers in all regions in a single-role design. This single-role design enabled Microsoft IT to eliminate single points of failure in the Mailbox server configuration by using DAGs. In Redmond, Dublin, and Singapore, Microsoft IT also used the single-role design for the remaining server roles, because these regions include a large number of users and multiple Mailbox servers.
Single-role server deployments give Microsoft IT the following benefits.
Optimized Server Hardware and Software Components
Different server roles require different hardware configurations and optimization approaches. For example, a Hub Transport server design for high performance must take sufficient storage capacity and I/O performance into consideration to support message queues in addition to message routing functions. However, Client Access servers typically do not have the same requirements for storage capacity.
During the initial production deployment, it was important to consolidate the number of server models to increase cost efficiency. Microsoft IT used the following hardware per server role:
Client Access Two quad-core Intel Xeon x5470, 3.3 gigahertz (GHz), with 16 GB of memory.
Hub Transport Two quad-core Intel Xeon x5470, 3.3 GHz, with 16 GB of memory.
Mailbox Two quad-core Intel Xeon x5470, 3.3 GHz, with 32 GB of memory.
Unified Messaging Two quad-core Intel Xeon x5470, 3.3 GHz, with 16 GB of memory.
Flexible Systems Scaling Approach
The single-role server deployment enables Microsoft IT to design server hardware more accurately according to specific tasks and increase the capacity of the messaging environment selectively according to specific demands and changing trends. For example, as demand for mobile messaging services continues to grow, Microsoft IT can increase the capacity of Client Access servers without affecting other areas in the messaging environment.
Structured System Administration and Maintenance
At Microsoft IT, several groups collaborate to deploy, manage, and maintain the messaging environment. Individual system engineers, program managers, and service managers specialize in specific areas of expertise, closely related to server roles. For example, different system engineers designed the message routing topology and the Mailbox server configuration.
Role-Specific Load Balancing and Fault Tolerance
Different server roles support different techniques and architectures for load balancing and fault tolerance. For example, if multiple Hub Transport servers exist in the same Active Directory site, Exchange Server 2010 balances the message traffic automatically between these servers, whereas Mailbox servers are not load-balanced in the same way. A mailbox can only be available on a single Mailbox server, whereas DAGs can maintain a redundant copy of the mailbox store across a larger set of servers to achieve high availability for the database.
Table 1 shows the number of servers and the technologies per server role that Microsoft IT uses in the corporate production environment to implement load balancing and fault tolerance.
Table 1. Servers in the Microsoft IT Exchange Server 2010 Environment
Automatic load balancing through Mail Submission Service
Hardware load balancers for incoming mail connectivity
Hardware load balancers internally and externally
Automatic round-robin load balancing between Unified Messaging servers
Multiple voice over IP (VoIP) gateways per dial plan
Mailbox Server Configuration
Planning the Mailbox server configuration for a large enterprise like Microsoft requires many components to be decided. These components include the mailbox size, the server architecture, the storage architecture, the resiliency levels, and the backup and restore capabilities. All of these items were re-designed during the implementation of Exchange Server 2010.
Determining these requirements is a difficult process. To make this process easier, Microsoft IT and the Exchange Server product group agreed on several initial goals. These goals included the following requirements:
Increase the mailbox quota to 5 GBWith reliance on e-mail increasing every year, a much larger mailbox size is necessary to sustain long-term growth.
Move to low-cost storageThe Exchange Server product group built Exchange Server 2010 to support low-cost storage architectures such as JBOD. Successfully implementing this internally is key.
Remove third-party backupsThe Exchange Server product group built Exchange Server 2010 to use native protection methodologies to provide backup and restore functionality. This also needed to be proven in a production environment.
To meet these simple goals, Microsoft IT devised a unique solution. The goal of 5-GB mailboxes was much larger than the average mailbox in the environment, which was around 890 MB at the end of 2008. This meant that a large gap existed between the current size of the mailboxes and the goal. To take advantage of this gap, Microsoft IT took a thin-provisioning approach to mailbox sizing.
The goal of moving to lower-cost disk architecture like JBOD took careful planning. In the past, the teams avoided JBOD because of the risk of data loss due to the nature of the architecture. JBOD is a storage architecture that uses a set of low-cost disks in an array with no built-in redundancy. Each disk is addressed individually and presented to the host for use. If any single disk fails, all data on that single disk is at risk.
This risk of data loss has prevented most modern enterprise applications from taking advantage of this lower-cost disk option in production implementations. To specifically focus on lowering the cost of storage for customers, the Exchange Server product group modified the built-in database resiliency to accommodate JBOD-style architectures. (These changes are described in more depth in the section titled "Optimization of the Storage Design for Reliability and Recoverability" later in this paper.) The new features in the product underwent joint testing in the pre-production environment, the product group's labs, and the Messaging Engineering team's engineering lab. Ultimately, JBOD was implemented across the enterprise.
Removing third-party backups is just as large of a change as moving to JBOD. Removing a working system that enables full recovery of data in the event of a disaster is a difficult task to plan for. Careful testing of the features and chosen architecture resulted in no third-party backups taken of an Exchange Server 2010 mailbox in production.
Microsoft IT defines thin provisioning as provisioning storage based on expected utilization as opposed to maximum possible utilization. Thin provisioning enables Microsoft IT to balance projected costs (possible infrastructure footprint in the future to deliver on the promise) and committed cost (current infrastructure footprint procured today). Determining a method for building a storage system to accommodate this careful balance requires a mature IT organization, and it also requires a long-term understanding of mailbox profiles and growth rates.
Internally, Microsoft has implemented a procedure for regularly collecting statistics and trending them in an internal database. This process involves several custom scripts that use built-in cmdlets such as Get-Mailboxstatistics. During a six-month planning stage of implementing thin provisioning, this trending procedure revealed that the average mailbox received 100–150 messages per day and the average growth rate was roughly 60 MB per user, per month. That meant that if the employee's mail profile remained the same, or if the employee continued to use his or her e-mail in the same manner that he or she had been during the last six months, it would be reasonable to assume that mailboxes would continue to grow at the same rate.
Note: It is important to understand that Microsoft IT made a decision to invest in the engineering and support teams that manage the messaging infrastructure over a decision to invest in enough storage to cover the full 5 GB Mailbox size. Microsoft IT made this decision after carefully analyzing the teams' abilities to build custom monitoring tools and regularly report on the growth of mailbox sizes and changes in the user profiles. Without this operational maturity, thin provisioning is difficult to prevent from over-running the allocated disk space, resulting in mailbox server outages. Every organization must carefully perform a similar analysis to understand the trade-offs between investing in storage and investing in engineering and support teams.
After Microsoft IT had a growth rate and an understanding of how to know whether the growth rate would change, the team needed to apply the rate to the committed cost of the infrastructure. At Microsoft, hardware investments are expected to be in use for three years, so a three-year projection is required to determine the storage investment required to last over that period.
Projecting the number of 60 MB per user, per month for three years results in an average mailbox size of 1.8 GB, in addition to the existing average size of 890 MB, for a total of 2.69 GB. This meant that if the mail profile and growth rates remained the same, the average mailbox size would be around 2.69 GB per user over the next three years, regardless of what the mailbox limit was. Microsoft IT added another 500 MB per user based on the following factors:
The average growth of mail over time
The changes introduced with the new version of the Mailbox dumpster
The move to keep deleted items for 30 days to support the backup goal
The content indexing space requirements
The free space requirements per user
The average number of transaction logs per user
Considering the law of averages and the expectation that all users would be mixed together among all of the databases in a particular region, some users would grow above this average and some users would stay well below this average. With that in mind, Microsoft IT set 3 GB as the planning number, and it examined storage that would support that size.
It was important for Microsoft IT to continue to monitor the growth of mail to ensure that the growth after moving to Exchange Server 2010 matched the anticipated growth defined previously. This means constant monitoring and comparing against the baseline to provide enough time to purchase additional servers and storage to accommodate growth beyond the anticipated numbers.
The mailbox is one of the very few components in an Exchange Server 2010 organization that cannot be load-balanced across multiple servers. Each individual mailbox is unique and can reside in only one mailbox database on one active Mailbox server. It follows that the mailbox store is one of the most critical Exchange Server components that directly affect the availability of messaging services.
With previous versions of Exchange Server, Microsoft IT relied on SAN or DAS solutions to provide the necessary configuration for its mailbox clusters. SAN provided a higher level of availability due to the robust architecture built around the physical disks, and it enabled Microsoft IT to achieve the number of disks required for I/O throughput and scalability. DAS solutions with redundant disk technologies, when combined with high-availability features such as CCR, also provided a higher level of availability and scalability. In fact, by using SAN and DAS with previous versions of Exchange Server, Microsoft IT was able to achieve 99.99 percent availability. These were both good solutions. However, the desire to continue to lower the per-gigabyte cost while increasing the size of the mailbox and maintaining a high level of availability required rethinking the solution.
To break through the old limitations, Microsoft IT defined the following storage design requirements for Exchange Server 2010:
Maintain 99.99 percent availability at the service level, while putting better measurements in place for that availability and increasing the services offered.
Increase Mailbox server resiliency by removing the two-server dependency and distributing the redundancy across more than two servers.
Reduce storage infrastructure costs and increase mailbox quotas.
Remove backups from the environment entirely by using many copies of the data, lower-cost JBOD arrays, and single item recovery.
To standardize the storage layout for Exchange Server 2010, help ensure reliability, and provide scalability, Microsoft IT implemented a DAS storage device that was low cost, easy to configure, and simple to replace. The architecture is based on 1-terabyte, 3.5-inch, 7,200-RPM (revolutions per minute) serial-attached SCSI (SAS) drives. These drives are combined into two shelves of 35 drives each for a total of 70 drives per JBOD array. The low cost of the disks and the cost saved by removing the overhead of RAID are key to the overall reduction in cost of the Exchange infrastructure. The cost savings are directly attributable to the reduction in IOPS and the increased data resiliency in Exchange Server 2010.
Figure 7 shows an example 10-node DAG connected to five JBOD arrays. This is the standard building block used for all DAGs. The universality of the design allows for easy modification as the environment changes. Each JBOD array contains a total of 70 spindles; these are split in half, with 35 on each shelf. The servers themselves are connected to one half of the shelf.
Figure 7. 10-node DAG storage design
To determine the required number of disks per database logical unit number (LUN), Microsoft IT considered the following factors:
Mailbox database capacity requirements As explained earlier, Microsoft IT uses thin provisioning to provide a growing mailbox size as the needs of the organization grow. Microsoft IT anticipates that the average mailbox will not reach the size used for planning purposes–3 GB, for three years. With anticipated mailbox sizes of 3 GB and raw disk capacities of 1 terabyte, Microsoft IT can easily determine the capacity constraint when determining the number of mailboxes that can be placed on a single disk, and therefore a single Mailbox server. The formatted disk capacity of slightly under 1 terabyte (917 GB), divided by 3 GB, leaves the capacity limit at 305 mailboxes.
Input/output performance To achieve optimal Mailbox server response times, the storage subsystem must be able to not only hold the capacity of the mailboxes, but also sustain the load that users generate in terms of IOPS without creating a bottleneck. A single 1-terabyte, 7,200-RPM SAS disk can perform approximately 100 IOPS with response times of less than 20 milliseconds. The question now is how many mailboxes can fit within the available I/O capacity, and whether that number is drastically different from the number that can fit based on the storage capacity. During testing within the pre-production environment and within the Exchange Server product group's labs, the average observed IOPS per mailbox was .3. With the available I/O capacity per disk at 100 IOPS, the remaining calculation is relatively straightforward. Microsoft IT simply takes the observed average IOPS (.3) multiplied by the anticipated number of mailboxes (305) to get the required number of IOPS (91.5). Next, Microsoft IT takes the IOPS capacity of the disk (100) and subtracts the required number of IOPS (91.5) to get the remaining IOPS capacity (8.5). Although 8.5 IOPS is not a lot of remaining IOPS, it is plenty of room to run the system efficiently. This calculation is much simpler than calculations of the past, because of the removal of complex RAID algorithms that must be accounted for when using RAID 5 or RAID 1+0.
By combining these two factors of storage design, Microsoft IT concluded that it would support 305 mailboxes per disk. Understanding the number of mailboxes per database—and in this case, per disk—is one of the first steps in Mailbox server design. Beyond database and disk considerations, memory configuration is a major decision that an organization must make when designing a Mailbox server.
The memory configuration that Microsoft IT devised resulted in 32 GB of memory on all Mailbox servers. The ideal server chassis would support an Intel Xeon x5470 3.3-GHz processor, would support at least eight memory chip slots, and would support the storage configurations outlined previously. Microsoft IT evaluated its options for filling the eight memory slots and found that its options were either 4-GB dimms or 8-GB dimms. This meant either 32 GB of memory or the higher-cost 64 GB of memory.
To understand the amount of memory required, the Exchange Server product group and Microsoft IT performed tests to determine how much memory the early releases of Exchange Server 2010 were using. Initial tests demonstrated that the operating system required roughly 2 GB of memory, whereas Exchange Server 2010 required roughly 4 GB of memory while running the maximum number of active databases in the design (16 databases). Taking the two options for memory, 32 GB or 64 GB, and subtracting the memory required for the base operation of the server left either 24 GB for the Extensible Storage Engine (ESE) cache or 58 GB. Dividing these options by the desired number of mailboxes per server provided either 8 MB or 20 MB of ESE cache per user.
During the time that Microsoft IT evaluated its available options, the Exchange Server product group also obtained results from its initial performance testing in its labs. The Exchange Server product group found that mailbox profiles that transfer (send/receive) 100 messages per day would require 6 MB of ESE cache, and mailbox profiles that transfer 150 messages per day would require 9 MB of ESE cache. The observed average message transfers was 120 messages per day, which meant that the 8 MB of ESE cache available in the 32-GB memory configuration would be sufficient to handle the desired workload.
Considering both the cost difference and the performance requirements, Microsoft IT designed 32 GB into its Mailbox server architecture. As with everything else in the environment, this configuration continues to be measured. ESE cache per user currently averages 5–8 MB per user.
Optimization of the Storage Design for Reliability and Recoverability
Although CCR is an effective high-availability feature, it does not eliminate the need for reliability and recoverability provisions at the storage and server levels. For example, failing over an entire cluster that has thousands of mailboxes between the active and the passive nodes might not be an effective measure if only a single disk in a disk array has experienced a failure or if the transaction log volume is running short of disk space.
DAGs allow for single database failovers and up to 16 copies of any database. The opportunity to take advantage of this type of failover capability, combined with a low-cost storage architecture like JBOD, requires some critical design decisions to provide an optimal storage design.
Microsoft IT uses the following features to help ensure reliability and recoverability at the storage level:
JBOD The Exchange Server product group specifically built Exchange Server 2010 to use lower-cost JBOD storage. With the removal of redundancy at the disk level, the design includes redundancy at the software level. In addition, the JBOD architecture uses fewer disks than a storage array that contains redundant disks. Additionally, the cost of the Serial Advanced Technology Attachment (SATA) or SAS disks available for JBOD arrays are significantly less expensive than the same-capacity disks for a Fibre Channel storage array. For these reasons, the design anticipated larger storage failure rates and storage that was more prone to errors. In practice, Microsoft IT observes an annual failure rate of only 2.75 percent, which is near the rate observed in the past with redundant storage. This is likely attributable to the improvements in disk designs over time and the emergence of mid-tier disks that are designed to be higher quality than the lowest-end disk and lower cost than the highest-end disk.
Consolidated transaction logs from database files With the reduced IOPS requirement from the ESE database and the changes in performance created by moving to JBOD, moving the transaction logs to the same disk as the database files will not negatively affect performance. It will provide a benefit in the overall storage design by associating single disk failures with a single database and log combination. With designs where the transaction logs and database are separated, each database is dependent on two separate disks to run.
Circular logging on all Mailbox servers With the removal of backups from the environment, log truncation is no longer tied to a backup schedule. For this reason, Microsoft IT has enabled circular logging to help ensure that log truncation occurs. This drastically reduces the amount of space required for log files.
Configuration of multiple databases per Mailbox server To ensure efficient use of servers and storage, each Mailbox server contains 35 passive database copies, although the server can support only 16 running at one time. To keep the system usable, Microsoft IT added a registry key to allow no more than 15 databases to be mounted at any one time.
ESE changesChanges made to the ESE database support better resiliency when mailbox databases are used in conjunction with DAGs. These changes support both a move to JBOD and a move to more than two copies of the database:
Default log checkpoint depthWhen running with a single copy of a database, the default log checkpoint depth per database is 20 MB. When more copies are added (two or more), this moves to 100 MB for the active, mounted copy and 5 MB for the passive, unmounted copy.
Automatic page restoreWhen running with a single copy of a database, page restore is not enabled. When more copies are added, automatic page restore takes advantage of the existence of passive copies of the data when determining how to handle corrupted pages in the active copy of the database. If a single corrupt page is discovered, automatic page restore first attempts to copy that page from one of the passive copies of the database to a new sector of the disk.
Database Availability Groups
With the storage and server components resolved, the final main goal of high availability remained. Before the transition to Exchange Server 2010, approximately 78 Mailbox servers were configured as 34 CCR clusters. After the transition to Exchange Server 2010, 104 Mailbox servers provided room for growth, all configured as DAGs—the base component of the high-availability and site resiliency framework built into Microsoft Exchange Server 2010.
To enable a smooth rollout of Exchange Server 2010 Mailbox servers with DAG configurations, Microsoft IT rolled out the service in batches in North America, resulting in a total of 30 servers. Microsoft It configured these servers in 11-node DAGs for the initial rollout in Redmond and 16-node DAGs deployed in Singapore and Dublin later on. Although beginning with the deployment of two 16-node DAGs in Redmond would have been easier, the smaller number of DAG members enabled Microsoft IT to experiment with expanding the number of DAG members only when needed and validating its deployment-on-demand approach. Table 2 summarizes the Mailbox server DAG environment across the enterprise.
Table 2. Configurations of Database Availability Groups
Number of actual mailboxes
Number of supported mailboxes
3.5” 1-terabyte SAS
3.5” 1-terabyte SAS
3.5” 1-terabyte SAS
3.5” 1-terabyte SAS
3.5” 1-terabyte SAS
3.5” 1-terabyte SAS
Management of Database Availability Groups
The move to large multi-node DAGs where four copies of every database are available within the site requires a change in how databases, mailboxes, and servers are managed. Microsoft IT has a huge number of items to manage. There are 384 database copies per 11-node DAG. These copies are the result of 96 active databases that have 3 copies each to support a possible 33,000 mailboxes. Operating this many databases per cluster requires the creation of a few basic rules.
Microsoft IT outlined the following guidelines for managing mailbox clusters:
The average number of active databases on any node should be 8–9.
The maximum number of active databases on any node should not exceed 16.
The active databases should be evenly distributed across all nodes in the DAG.
Each database disk should not have more than 750 GB used.
To maintain the databases within these guidelines, Microsoft IT has built several monitoring tools and management scripts. By using the cmdlets available in Exchange Server 2010 and some custom scripting in Windows PowerShell, the team performs several functions.
Currently, the system is averaging database disk utilization of around 600 MB. If System Center Operations Manager finds that a disk is over the 750 MB quota, Microsoft IT manually runs a set of scripts to move the mailboxes to other databases within the DAG, stub the database, and create a new database on the same server. This process of stubbing is not simple, but it is required to ensure a clean migration of mailboxes and re-creation of the database. The process includes the following steps:
Verify that all existing copies of the databases are able to be mounted.
Move all users off the database and distribute evenly into the existing databases. This process includes a simple algorithm that spreads the mailboxes to the databases with the lowest number of existing mailboxes.
Verify that the databases contain mailboxes by using Exchange Management Shell cmdlets:
Set-ADServerSettings -ViewEntireForest $True
Get-Mailbox -Database "Database ID"
Disable circular logging for the database by using an Exchange Management Shell cmdlet:
- Set-MailboxDatabase "Database Identity" -CircularLoggingEnabled $false
Suspend all copies of the database and remove all but one copy of the database.
Dismount the active database.
Delete the log files (and *.chk files) manually from all copies of the database (active and passive).
Mount the active database. At this point, the log generation number has reset.
Dismount the database.
Remove the last passive copy of the database.
Delete the .edb files from all copies of the database (active and passive).
Mount the database by using an Exchange Management Shell cmdlet and force the creation of a new .edb file:
- Mount-Database –Force
Add a database copy for all designated passive servers by using an Exchange Management Shell cmdlet:
- Add-MailboxDatabasecopy db identity -MailboxServer <serverName> -SeedingPostponed
Reseed all the passive copies of the databases by using an Exchange Management Shell cmdlet:
- Update-MailboxDatabaseCopy dbcopy identity -DeleteExistingFiles
Enable circular logging.
To monitor the number of active databases per server, Microsoft IT runs (weekly) a custom script that queries the servers to determine the number of active databases. After the script finds the number of active databases per server in the DAG, it divides the total number of active databases in the DAG by the number of active nodes, and then moves databases as necessary to even the load.
Elimination of Storage as the Single Point of Failure
Exchange Server 2010 moves to a single method of database resiliency across servers by supporting the database replication technique of DAGs. Whereas DAGs rely on Windows Clustering, Exchange Server 2010 fully manages the clustering components, so the administrative staff does not manually configure any cluster. DAGs provide redundancy at the storage level by replicating mailbox data from one server to another at the database level through a mechanism commonly known as asynchronous log shipping. Because of the decision to remove backups from the environment and to implement the feature set of the product in production, Microsoft IT decided to use this technology as the core of its Mailbox server designs to increase Mailbox server resiliency from storage-level failures.
To use server hardware efficiently while providing the required redundancy level through DAG, Microsoft IT implemented 11-node DAGs in Redmond and 16-node DAGs in Singapore and Dublin. Underneath the Exchange Server configuration, Exchange Server sets up a Majority Node Set (MNS) server cluster with file-share witnesses. Each cluster node stores a copy of the MNS quorum on the local system drive and keeps it synchronized with the other node.
The file-share witness feature enables the use of a file share that is external to the DAG as an additional vote to determine the status of the database on each of the nodes that are participating in the DAG. This helps to avoid an occurrence of network partition within the cluster, also known as split-brain syndrome. Split-brain syndrome occurs when all networks designated to carry internal cluster communications fail, and nodes cannot receive heartbeat signals from each other.
To enable the file-share witness feature, Microsoft IT specifies a file share by using the WitnessDirectory property on the DAG configuration to specify a directory on a Hub Transport server. For DAGs that are extended beyond a single site, Microsoft IT uses an alternate file-share witness on the Hub Transport servers in the secondary site by using the AlternateWitnessServer property.
Microsoft IT uses three network adapters in each DAG node. The first adapter connects the node to the public network where Outlook clients connect for public folders and where the Hub Transport and Client Access servers connect to interact with mailboxes. The Microsoft Exchange Replication Service on the cluster nodes uses the second adapter on a private network connection to replicate the mailbox store databases from nodes that are hosting the mounted copy of a database to nodes that are hosting passive copies of the database. The third network adapter is for out-of-band management.
Another important component that is necessary to ensure high availability in the DAG-based Mailbox server configuration is the transport dumpster feature on Hub Transport servers. This feature enables messages to be redelivered to a mailbox database in the event of a lossy failure. It is important to ensure that Hub Transport servers have the appropriate capacity to handle a transport dumpster. Mailbox servers might request redelivery if a failover occurs to a passive database before the most recent transactions have been replicated. To configure the transport dumpster, Microsoft IT uses the set-TransportConfig cmdlet with the following parameters:
MaxDumpsterSizePerDatabase This parameter specifies the maximum size of the transport dumpster queue per mailbox database. Microsoft IT specifies 15 MB for this parameter.
MaxDumpsterTime This parameter specifies how long the Hub Transport server can retain messages in the transport dumpster queue. Microsoft IT uses a value of 07.00:00:00, which corresponds to seven days.
Note: Microsoft IT uses the same hardware configuration on all DAG nodes to maintain the same performance level after a failover to an unmounted database.
Best Practices for Mailbox Servers
During the implementation of the new Mailbox server infrastructure, Microsoft IT found that moving to a large number of databases provided some new best practices for managing a Mailbox server layer and DAGs:
Migration destinationDuring the migration of mailboxes to Exchange Server 2010, it is important to migrate an equal number of users across DAG nodes to help ensure optimal migration performance. Otherwise, poor performance on nodes that are heavily favored will quickly slow the overall migration progress.
Maximum databases per serverWith three database copies for every database, it is not feasible to have enough nodes to allow the databases to fail over to any node. This means that each node will have roughly double the number of databases that it can support (considering the memory and processing power of any particular node). To account for this, Microsoft IT uses a registry key to ensure that each node never mounts more than 16 databases at any one time.
Database redistributionDespite the maximum of 15 databases mounted at any one time, Microsoft IT found that performance of the Mailbox servers was best when no more than nine databases were running at any one time. Because each node allows up to 15 databases to be mounted at any one time, Microsoft IT wrote a weekly database redistribution script to help ensure that on average, nine databases were mounted on a node at any time.
Backup and Recovery
One of the major decisions for Microsoft IT during the planning cycle for the Exchange Server 2010 implementation was to remove backups from the environment entirely. The decision to remove backups was driven by an anticipated cost savings and a desire to prove that an environment that lacked a backup was operationally efficient and feasible. The savings of this decision is apparent in both the capital expenditures associated with the physical hardware that would have been required to back up the 5-GB mailbox sizes and the operational expenditures with the team that manages the hardware. Despite the removal of the backup infrastructure, this decision also meant that the Mailbox server and storage infrastructure required an increase in cost to account for the removal of a backup system.
Estimates of what a backup environment would have cost and what the increase in cost of the Mailbox servers were show that there was a net savings in server and storage hardware. The decision required careful planning, because removing backups from the environment does not enable the elimination of restoration of e-mail messages or the cost associated with restoration. This decision required some changes to help ensure the restoration of a portion of mail messages and the recovery of messaging system functionality. The main architectural processes that enabled this decision included the use of DAG, single item recovery, and large mailbox sizes.
In addition to the basic process of mail restoration, the need for system-wide protection was a major component to the decision process to remove backups completely. In Microsoft IT’s previous implementation of Microsoft Exchange, the risk of losing all copies of data and causing a system-wide failure with no recovery mechanism has traditionally been mitigated with a backup copy. The move away from this form of mitigation by removing backups requires a new mitigation technique against the risk of losing all on-line copies of the data. Microsoft IT currently mitigates this risk through the rigorous monitoring and disk replacement strategy outlined in the previous section, "Mailbox Server Configuration." Without a reliable procedure for closely monitoring disk failures and managing the disk replacement and data rebuild process, Microsoft IT would not be able to mitigate the risk and could not move to a backup-free environment.
Note: While the risk of data loss and the impact of that data loss are greater when no backup copy exists, it is important to understand that closely monitoring disk failures regardless of the backup strategy or underlying storage technology is critical to running a Microsoft Exchange environment. If failed disks are not discovered and replaced immediately, the likelihood of a required restore from backup and incurring the cost associated with those restores rises significantly.
For the rollout of Exchange Server 2010 in the corporate production environment, Microsoft IT defined the following backup and recovery requirements:
Support mailbox capacities of 5 GB.
Reduce backup costs by eliminating third-party backups.
Reduce administrative overhead by simplifying the mail restore process.
Provide recovery of mail items up to 30 days old.
Eliminating Backups Completely
Removing backups from the messaging architecture entirely was a large decision that required careful planning. During that planning, the Microsoft teams discussed several options for removal of a third-party backup product. The Exchange Server product group built features that can be combined in a variety of ways to meet the customer’s needs. For Microsoft IT, the decision first meant determining what the business needs were. The needs that Microsoft IT identified included the following:
A minimum of 30 days of data available to be recovered at any time
The ability to recover any single item that was deleted within those last 30 days
The ability to hold information for longer than 30 days if active litigation required it
The safety to know that if one or two copies of the data went offline, the e-mail system data could still operate or be recovered
The first option that Microsoft IT considered to meet these needs was DAGs for high availability and general resiliency. DAGs enabled the system to continue to operate during single-server or single-storage failure events. The second option that Microsoft IT considered was using the new version of Mailbox dumpster and its associated feature, single item recovery, to provide a solution in which any item could be recovered from a database dumpster during the period that the dumpster was available. The third option was to use a lagged database copy to provide a point-in-time copy of the database that the team could use in a restore if the non-lagged copies became corrupted.
Initially, Microsoft IT implemented all three options. Every mailbox database is built as a member of a DAG. Every mailbox database has single item recovery enabled. The count of copies initially was three active copies of the database and one lagged copy of the database.
After implementing the solution, Microsoft IT re-evaluated the use of the lagged database copy to meet its fourth goal. The team found that the DAG safely met this goal, and a better option was to change the fourth lagged copy to simply be a fourth copy of the database.
Microsoft IT never implemented any form of traditional or third-party backups in its Exchange Server 2010 environment. The first design that Microsoft IT tested in the pre-production environment included a lagged copy. The environment that Microsoft IT tested during this time frame consisted of the minimum recommended number of copies for an environment that had no backup (three copies), plus one lagged copy. The lagged copy provided a point in time that could be recovered from and offered several benefits to Microsoft, including protection from store or database logical corruption.
Despite the benefits of being able to have all recovery data on separate disks, this model did not meet the reduction in administrative overhead that Microsoft required. Important considerations included:
The ability to run all databases on available disks versus dedicating a few disks that are available to the system that is used for high availability. This approach helps ensure a cost-effective balance of high availability and resource utilization.
Uniformity across all DAG nodes helps ensure that no server or set of databases is treated separately and must be processed with a different set of rules.
As with all copies of databases, a potential for corruption always exists. When corruption occurs, reseeding the database is necessary. For lagged copies, this means that the logs that have not replayed into the database are lost, essentially removing the backup from existence.
The process of restoring mail from an inactive database is long regardless of how that inactive database came into existence. Restoring from a disk-based backup or from a lagged database copy requires a procedure of making the database available to mount, mounting the database, and exporting a mail message to an Outlook personal storage (PST) file.
Microsoft IT enabled single item recovery, a part of the re-architected dumpster, to support a number of advanced litigation features.
With the previously outlined goals in mind, Microsoft IT also tested the second option for running a backup-free environment in the pre-production environment. This method would rely solely on the new features in the mailbox dumpster and the single item recovery capability. To understand how these new Mailbox dumpster features and single item recovery can be used in place of a third-party backup system, it is important to first understand how the Mailbox dumpster works.
Unlike the first version of the mailbox dumpster, the new version is no longer simply a view of the database. The Mailbox dumpster in Exchange Server 2010 is implemented as a folder called the Recoverable Items and is located within the Non-IPM subtree of the mailbox. The folder has three subfolders: Deletions, Versions, and Purges.
Architecting the dumpster to be a folder immediately meets three of the requirements:
The mailbox dumpster data is indexed and discoverable.
The mailbox dumpster data can be moved with the mailbox.
The mailbox dumpster data is stored on a per-mailbox basis rather than a per-folder basis. From an end-user perspective, this means that deleted data is easier to recover because the Recover Deleted Items tool will expose deleted data across the entire mailbox.
To prevent denial-of-service attacks that place large quantities of data into the dumpster, the new version of the mailbox dumpster has the following quota settings. These settings can be configured per database and per mailbox:
RecoverableItemsWarningQuotaSets the soft limit that defaults to 20 GB. When the Recoverable Items folder reaches that size, the Exchange administrator is notified via an event log alert. This alert should occur at the time that the mailbox reaches the limit and once daily afterward. By default, the mailbox is not configured with this property, meaning that database-level limits are used. At this limit, items begin to be deleted from the dumpster through the first-in/first-out (FIFO) principle—essentially, the oldest items in the dumpster are deleted first. For example, consider a mailbox that has a dumpster that is 20 GB in size, and the user deletes an additional 50 MB of data. In this case, the oldest 50 MB of data is deleted to make room for the new 50 MB of data.
RecoverableItemsQuotaSets the hard limit that defaults to 30 GB. At that limit, soft deletes fail. The Exchange administrator is notified via an event log alert. This alert should occur at the time that the mailbox reaches the limit and once daily afterward. By default, the mailbox is not configured with this property, meaning that database-level limits are used.
Note: Exchange Server 2010 includes the capability for each mailbox to also maintain an archive mailbox. There is a dumpster for both the primary mailbox and the archive mailbox. Data deleted in the primary mailbox is placed in the primary mailbox dumpster, whereas data deleted in the archive mailbox is placed in the archive mailbox dumpster.
Exchange Server 2010 includes the ability to preserve data within the mailbox for a period of time. An administrator can enable this feature on a per-mailbox basis by setting the SingleItemRecoveryEnabledswitch to True on the Set-Mailbox cmdlet. The deleted item retention window determines the period for which the deleted data is maintained. The default period is 14 days in Exchange Server 2010 and is configurable per database or per mailbox. To meet the 30-day recoverability requirement, Microsoft IT increased the default to 30 days.
Note: Regardless of whether single item recovery is enabled, Exchange Server 2010 maintains calendar items in the Recoverable Items folder structure for 120 days. Long-term data preservation via litigation hold will disable the expiration of the items.
This ability to preserve data is different from previous versions of Exchange Server because the data cannot be completely deleted (hard-deleted) until the deletion time stamp is past the deleted item retention window. Even if the end user attempts to hard-delete the data, the data must be retained. With the new version of the Mailbox dumpster, instead of the messages being hard-deleted, they are moved from the Recoverable Items\Deletions folder to the Recoverable Items\Purges folder. All items that are marked to be hard-deleted go to this folder when single item recovery is enabled. The Recoverable Items\Purges folder is not visible to the end user, meaning that they do not see data retained in this folder in the Recover Deleted Items tool.
When the message deletion time stamp has exceeded the deleted item retention window, Records Management will hard-delete the item. Figure 8 provides a visual representation of this behavior.
In addition to prevention of hard-deleting data before the deleted item retention window has expired, the short-term retention features also enable versioning functionality. As shown in Figure 8, when an item is changed, a copy-on-write occurs to preserve the original version of the item (depicted in step five of Figure 8). The original item is placed in the Recoverable Items\Versions folder. The employee does not see this folder.
Figure 8. Folders inside the Mailbox now include the new Mailbox dumpster
Copy-on-writes occur in the following scenarios:
For messages and posts (IPM.Note* and IPM.Post*), copy-on-write captures changes in the subject, body, attachments, senders/recipients, and sent/received dates.
For other types of items, copy-on-write occurs for any change to the item, except for moves between folders and read/unread status changes.
Drafts are exempt from copy-on-write to prevent excessive copies when drafts are auto-saved.
The data stored in the Recoverable Items\Versions folder is indexed and discoverable for compliance officers.
Ultimately, single item recovery was the method that Microsoft chose to use in the corporate environment. Not only did it meet the basic needs outlined previously (such as reduced administrative overhead, removal of third-party backups, and improved SLAs). This solution also enabled Microsoft IT to provide better compliance during day-to-day operations.
Moving from disk backups to a backup-free environment gave Microsoft IT the following advantages:
Reduced backup costs and complexities Eliminating backups enabled Microsoft IT to reduce costs by removing the administrative overhead and capital expense associated with the backup server infrastructure. Additionally, without any form of backups occurring in the environment, there are no scheduling constraints related to maintenance.
Increased reliability of restore operations Microsoft IT previously had to find missing mail through a separate process, perform a database restore, and then go through the steps to find the data in the restored copy to recover to a PST file or to the mailbox. These steps are all consolidated, and only a search of the mailbox is required before exporting to PST files.
Single item recovery is only one part of a broader goal for Exchange Server 2010. The broader goal was a compliance mechanism that enables a mailbox to be placed on hold—for long periods if necessary—and all items in the mailbox to be maintained in a consistent fashion. Both external customers and Microsoft IT repeatedly requested this goal. The need was based on the occasional requirement to produce information from mailboxes during the process of ongoing litigation that an employee who uses the mailbox is involved in. With Exchange Server 2010, an administrator can enable litigation hold via the Exchange Control Panel or by setting the property LitigationHoldEnabled via the Set-Mailbox cmdlet.
When litigation hold is enabled, records management ceases hard-deleting dumpster data, and the following occur:
When the end user deletes an item from the Deleted Items folder or shift-deletes Outlook or Outlook Web App from any folder (soft delete), the item is moved to Recoverable Items\Deletions folder. There, it can be recovered in the Recover Deleted Items view in Outlook or Outlook Web App.
When the end user hard-deletes data from the Recover Deleted Items view (hard delete from the Recoverable Items\Deletions folder), the item is moved to the Recoverable Items\Purges folder.
When records management hard-deletes data from the Recoverable Items\Deletions folder (because data has reached the age limit for deleted items), the item is moved to the Recoverable Items\Purges folder.
When the end user modifies an item, a copy of the original item is placed in the Recoverable Items\Versions folder based on the criteria discussed previously.
Also, when litigation hold is enabled, the properties RecoverableItemsWarningQuota and RecoverableItemsQuota (defined either on the mailbox database or on the mailbox) are ignored. Data is therefore not hard-deleted or prevented from being stored in the dumpster.
Recovering Mail Items
Recovering mail items takes two forms:
Employees can access the Recover Deleted Items tool and recover items on their own.
An administrator causes mail items that the Recover Deleted Items tool cannot access to be made available and either restored directly to the mailbox or to a PST file and provided to the employee.
In past implementations of Exchange Server, the former method of restoration was limited because of the limited number of items that were presented for recovery in the Recover Deleted Items tool. The process for the latter was long and arduous, taking one to many weeks to complete.
Moving to the new version of the mailbox dumpster as described earlier in this paper removes both of the problems with the earlier implementations of Exchange Server. The amount of information that employees can directly recover through the Recover Deleted Items tool has increased. The required procedure for recovering items that are not available in the Recover Deleted Items tool has been simplified and is now measured in days. For all information in the Recoverable Items\Deletions folder (which is the majority of the mail), the employees recover these items on their own.
Messages that are stored in the Recoverable Items\Purges folder and the Recoverable Items\Versions folder require a separate process because employees cannot discover or access those messages. However, the data is indexed and discoverable for those who have the proper access role in the Exchange organization. Role Based Access Control (RBAC) provides the Discovery Management role to allow security-enhanced search access to non-technical personnel, without providing elevated rights to make any operational changes to the Exchange Server configuration.
The Exchange Client Support (ECS) team provides this service at Microsoft. The members of ECS are in the Discovery Management role. They use the Exchange Control Panel to perform discovery searches through an easy-to-use search interface. When ECS receives a request for a single item recovery, it takes the following actions:
A member of ECS uses the Exchange Control Panel to target a search against the mailbox to locate the data in question. The framework to perform the search enables the administrator to use Advanced Query Syntax. The administrator then extracts the recovered data and places it in the discovery mailbox, in a folder that bears the employee's name and the date/time that the search occurred.
The administrator opens the discovery mailbox via Outlook Web App or Outlook and verifies that the content recovered is the content that the end user requested.
The administrator uses the Exchange Management Shell to perform an export-mailbox against the discovery mailbox, targeting the employee's mailbox. The administrator exports the data to a PST file and provides it to the employee to load in his or her mailbox; or, the administrator helps the employee load the data in his or her mailbox.
Client Access Server Topology
Reliability and performance of Mailbox servers are crucial for the availability and quality of messaging services in an Exchange Server 2010 organization. Another important component is the Client Access server, which provides access to messaging items in mailboxes, availability information, and address book data in various scenarios.
In Exchange Server 2010, the Client Access server role supports the traditional functionality that includes Outlook Web App, Exchange ActiveSync, POP3, and IMAP4 protocols. Beginning in Exchange Server 2007, the Client Access server role also provides access to free/busy data by using the Availability service and enables certain clients to download automatic configuration settings from the Autodiscover service. For example, users might work with Outlook Web App in a Web browser session or synchronize mobile devices by using the Exchange ActiveSync protocol. Users can also work with the full Microsoft Office Outlook 2007 or Microsoft Outlook 2010 client over Internet connections, accessing their mailboxes through Outlook Anywhere.
Outlook 2010 clients specifically communicate with Client Access servers in several additional scenarios, such as to retrieve profile configuration settings by using the Autodiscover service, checking free/busy data by using the Availability service (which is part of Exchange Web Services), and downloading Offline Address Book (OAB) from a virtual directory on a Client Access server. In all these cases, users communicate with a Client Access server, which in turn communicates with the Mailbox server by using the native Exchange Server Messaging Application Programming Interface (MAPI).
In addition to this functionality, Exchange Server 2010 has moved access to mailboxes via the MAPI off the Mailbox servers themselves and onto the Client Access server. This concept is known as RPC Client Access Service. Moving MAPI onto the Client Access server enables improved processes that were not available in previous versions, including Online Mailbox Moves and the database to failover to any one of the Mailbox servers in a DAG without the need for the Outlook client to reconnect to a specific Mailbox server. The Outlook client remains connected to the Client Access server the entire time, and the connection from the Client Access server to the Mailbox server changes in the background.
To provide Microsoft employees with reliable access to their mailboxes from practically any location that has network access, Microsoft IT defined the following requirements for the deployment of Client Access servers:
Establish flexible and scalable client access points for each geographic region that can accommodate the existing mobile user population at Microsoft and large spikes in mobile messaging activity.
Preserve common URL namespaces (such as https://mail.microsoft.com) that were established during Exchange Server 2003 rollout for all mobile messaging clients within each individual geographical region.
Deploy all mobile messaging services on a common standardized Client Access server platform.
Establish a unified load-balancing strategy for internal and external access.
Provide seamless backward compatibility with mailboxes still on the Exchange Server 2007 platform during transition and provide support for cross-forest access to enable availability of free/busy information.
Providing Load Balancing and Fault Tolerance for Client Connections
To distribute the client connection load, Microsoft IT uses hardware load balancers. This is a change from using the software-based Network Load Balancing and Microsoft Internet Security and Acceleration (ISA) Server 2006 in the Exchange Server 2007 environment and provides a unified load-balancing architecture for both internal and external client connections.
Internally, all domain-connected Outlook 2007 and Outlook 2010 clients perform a Lightweight Directory Access Protocol (LDAP) request to retrieve the Service Connection Point (SCP) records for Autodiscover. As part of the Autodiscover configuration in a forest, for each Client Access server, an administrator should specify the Active Directory sites where the Client Access server is responsible for Autodiscover requests via the Set-ClientAccessServer AutoDiscoverSiteScope property. The default configuration for the InternalURL property must be changed to point to the hardware load-balanced fully qualified domain name (FQDN) for Autodiscover via the set-AutodiscoverVirtualDirectory cmdlet.
Externally, Outlook 2007 and Outlook 2010 clients attempt to connect to Audodiscover based on the FQDN of the primary e-mail address on the profile. Outlook attempts a few different methods for finding the Autodiscover services. It uses everything to the right of the @ symbol (the SMTP domain) and prepends “autodiscover” and appends “/autodiscover/autodiscover.xml.” If Outlook does not find the Autodiscover service there, it attempts to use the SMTP domain with “/autodiscover/autodiscover.xml” appended to the end. To ensure that these URLs point to the load balancer, the external FQDN must reference an IP address that passes the connections to the IP address of the internal hardware load balancer. Additionally, an administrator must use the set-AutodiscoverVirtualDirectory cmdlet to change the ExternalURL property to make it one of the external FQDNs that Outlook looks for.
Administrators can use specific cmdlets to individually configure all other messaging services internally and externally (Outlook Web App, Exchange Web Services, Exchange Control Panel, OAB, and Exchange ActiveSync) to use separate URLs. Microsoft IT consolidated all of these namespaces so that mail.microsoft.com is used for /owa, /ews. /ecp. /oab, and /Exchange-Server-Activesync.
The namespace for the RPC Client Access Service is new to Exchange Server 2010 and is controlled through the Client Access server array. Microsoft IT has consolidated all Client Access server arrays by region so that all mailbox databases in Redmond point to rpc://outlook.redmond.corp.microsoft.com, whereas the mailbox databases in the other regions point to similarly formatted FQDNs. These FQDNs are also configured on the hardware load balancer.
The move to a hardware load balancer enables all services (internal and external) to be managed in a similar way, reducing administrative overhead and complexity in the environment. This move also enables the load-balancing scheme to be optimized for each protocol. For example, HTTP affinity options range from cookie-based affinity to Secure Sockets Layer (SSL) ID. The load-balancing configuration for some of the protocols such as RPC/HTTP have been made stateless, which further improves the ability to scale.
Note: Microsoft IT uses a split Domain Name System (DNS) configuration to accommodate the registration of internal Client Access servers, provide load balancing, and provide fault tolerance for internal client connections.
Figure 9 illustrates the load-balancing configuration that Microsoft IT established for internal and external messaging clients.
Figure 9. Client access architecture for internal and external clients
Preserving Existing Namespaces for Mobile Access to Messaging Data
Each month, Client Access servers in the corporate production environment support approximately 92,000 Outlook Web App unique users, 82,000 Outlook Anywhere connections, 63,000 Exchange ActiveSync sessions, and more than 100,000 RPC Client Access connections.
Microsoft IT established this topology during the Exchange 2000 Server time frame, which means that Microsoft employees became accustomed to these URLs over many years. Preserving these URLs during the transition to Exchange Server 2010 and providing uninterrupted mobile messaging services through these URLs was correspondingly an important objective for the production rollout.
Preserving the existing URL namespaces during the transition from Exchange Server 2007 to Exchange Server 2010 was not as straightforward as the transition from Exchange Server 2003 to Exchange Server 2007. Microsoft IT devised the following strategy to ensure a smooth transition:
Create new namespaces for the Exchange Server 2007 namespaces in both internal and external zones. This step moved the Exchange Server 2007 namespace from mail.microsoft.com to owa.microsoft.com.
Deploy the Exchange Server 2010 Client Access servers in the corporate production environment in each data center and Active Directory site where Exchange Server 2010 Mailbox servers were planned.
Create DNS records for the Exchange Server 2010 Client Access arrays and the SCP records for the new Exchange Server 2010 Client Access servers.
Reconfigure the Exchange Server 2007 Client Access servers to use a new namespace. This step includes the configuration of new SSL certificates for the Exchange Server 2007 Client Access server that include the new namespace.
Create a rule on the hardware load balancer to publish the new URL, owa.microsoft.com.
Test access to Exchange Server 2007 and Exchange Server 2010 resources through Client Access servers in all locations by manually pointing clients to them.
Modify the DNS records for the existing namespace to point to the Virtual IP of the new Exchange Server 2010 Client Access server cluster. At this point, the Autodiscover SCP records point to Exchange Server 2010 Client Access servers.
Modify external DNS to point autodiscover.microsoft.com to the new Exchange Server 2010 Client Access server rule on the hardware load balancer for external network users.
Modify external DNS to point the external namespace to point to the new Exchange Server 2010 Client Access server rule on the hardware load balancer for external network users.
Move mailboxes from Exchange 2007 servers to new Exchange 2010 servers and enable new mobile messaging features for transitioned users.
As explained in the “Message Routing” section earlier in this white paper, Exchange Server 2010 extensively uses the concept of Active Directory sites, such as to define logical boundaries for message routing and server-to-server communications. For client access scenarios, this means that each Active Directory site with Mailbox servers must also include Client Access servers to ensure a fully functional messaging system. Accordingly, Microsoft IT deployed Client Access servers locally in the data centers of Dublin, Sao Paulo, Singapore, and Redmond.
As illustrated in Figure 10, Microsoft IT heavily focused the deployment on dedicated servers with varying ratios of Mailbox servers to Client Access servers in order to establish flexible and scalable messaging services that can accommodate large spikes in user activity. In Sao Paulo only, Microsoft IT deployed a multiple-role server that hosts Hub Transport and Unified Messaging server roles in addition to the Client Access server role, because of the moderate number of users in those regions.
Figure 10. Global deployment of Client Access servers
Client Access servers only communicate directly with Exchange Server 2010 Mailbox servers in their local Active Directory site. For requests to Mailbox servers in remote Active Directory sites, Client Access servers must proxy or redirect the request to a Client Access server that is local to the target Mailbox server. Microsoft IT prefers to redirect Outlook Web App users. Keeping client connections local within each geographic region minimizes the impact of network latencies between the Client Access server and Mailbox server on client performance and mitigates the risk of exhausting thread pools and available connections on central Client Access servers during large spikes in messaging activity. To redirect Outlook Web App users to Client Access servers that are local to the user’s Mailbox server, Microsoft IT registers the relevant external URL on each Internet-facing Client Access server in the ExternalURL property for all virtual directories.
Note: An exception to the distributed approach to mobile messaging services at Microsoft is the Internet access point for the Autodiscover service used for automatic profile configuration of clients such as Outlook 2010. Microsoft IT provides centralized access to the Autodiscover service because of its reliance on the primary SMTP address of the users. All Microsoft users in the Corporate forest have the same SMTP domain name worldwide (that is, @microsoft.com). Outlook 2010 derives the Autodiscover URL from the user's primary e-mail address and attempts to access the Autodiscover service at https://autodiscover.<SMTP domain>/autodiscover/autodiscover.xml or, if this URL does not exist, at https://<SMTP domain>/autodiscover/autodiscover.xml. Microsoft employees who work with Outlook 2010 over the Internet connect to https://autodiscover.microsoft.com/autodiscover/autodiscover.xml regardless of their geographic location.
Optimizing Distribution of Offline Address Book
Exchange Server 2010 uses the same method for OAB distribution to Outlook 2007 and Outlook 2010 clients. The mechanism uses Windows Background Intelligent Transfer Service (BITS) over HTTPS connections instead of downloading the OAB files from a public folder by using MAPI. Especially with large OAB files, BITS provides significant advantages over the traditional OAB download method that previous versions of Outlook use, because BITS requires less network bandwidth, downloads the OAB files asynchronously in the background, and can resume file transfers after network disconnects and computer restarts. Exchange Server uses HTTP as the default communication method. Microsoft IT uses a trusted SSL certificate and enables SSL on the appropriate OAB directory for internal clients.
Within the corporate production environment, Microsoft IT generates four different OABs according to each geographical region. This approach of generating and distributing OABs locally helps Microsoft IT minimize OAB traffic over WAN connections while providing users with relevant address information in cached or offline mode. Accordingly, Microsoft IT configured an Exchange 2010 Mailbox server in each geographic region as the OAB Generation (OABGen) server and local Client Access servers as hosts to download the regional OABs. The Exchange File Distribution Service, running on each Client Access server, downloads the OAB in the form of XML files from the OABGen server into a virtual directory that serves as the distribution point for Outlook 2007 and Outlook 2010 clients. Outlook 2007 and Outlook 2010 clients can determine the OAB download URL by querying the Autodiscover service.
Outlook 2007 and Outlook 2010 clients on the Internet access the OAB virtual directory through the hardware load balancers. Within the internal network, Outlook 2007 and 2010 clients can access the OAB virtual directory on Client Access servers directly without the need to go through the hardware load balancers. Outlook 2003 clients can still download the OABs from the public folder by using MAPI and RPCs directly or, in the case of external Outlook 2003 clients, by using MAPI and RPCs through Outlook Anywhere.
Enabling Cross-Forest Availability Lookups
Another important business requirement that Microsoft IT needed to address in the Client Access architecture concerned the integration of free/busy and availability information across multiple forests to facilitate meeting management for all Microsoft employees. As mentioned earlier in this white paper, Microsoft IT maintains several forests with Exchange Server organizations. Some of these forests serve the purpose of compatibility testing with previous product versions, whereas others run pre-release software. Accordingly, Microsoft IT had to provide for seamless integration and backward compatibility in the cross-forest availability architecture.
Three components must work together for backward-compatible, seamless availability integration. First, it is necessary to synchronize address lists across all forests, which Microsoft IT accomplishes by using Identity Lifecycle Manager 2007. Second, if Outlook 2003 or earlier clients are used, it is also necessary to synchronize free/busy items between messaging environments. The third component that is important for cross-forest availability integration is the Client Access server, specifically the Availability API, which Outlook 2010 clients and Outlook Web App use to obtain availability information for all Microsoft users.
Note: The Exchange Server 2010 Availability service is a Web service on Client Access servers to provide Outlook 2010 clients with access to the Availability API. Outlook Web App uses the Availability API directly and does not use the Availability service.
Figure 11 shows the cross-forest availability architecture that Microsoft IT established in the corporate production environment. Although clients that use Outlook 2003 continue to work with free/busy items in public folders as usual, Client Access servers communicate differently to process availability requests from users who have mailboxes on Exchange Server 2010 and who work with Outlook 2010 or Outlook Web App. If the target mailbox is in the local Active Directory site, Client Access servers use MAPI to access the calendar information directly in the target mailbox. If the target mailbox is in a remote Active Directory site, the Client Access server proxies the request via HTTP to a Client Access server in the target mailbox’s local site, which in turn accesses the calendar in the user’s mailbox.
Figure 11. Cross-forest availability architecture
To enable cross-forest availability lookups in Exchange Server 2010, Microsoft IT implemented the following configuration:
Trusted forests By using the Add-AvailabilityAddressSpace cmdlet, Microsoft IT specifies the per-user access method (-AccessMethodPerUserFB) to create the Availability Space with support for most detailed availability information. Subsequently, Microsoft IT grants all Client Access server accounts the necessary ms-Exch-EPI-Token-Serialization extended right through the Add-ADPermission cmdlet.
Non-trusted forests and forests without Client Access servers By using the Add-AvailabilityAddressSpace cmdlet, Microsoft IT specifies the public-folder access method (-AccessMethod PublicFolder) to continue using free/busy items in this Availability Space.
Note: Client Access servers can use an organization-wide access method (-AccessMethod OrgWideFB) for communication with remote Client Access servers in non-trusted forests. However, Microsoft IT does not use this access method to avoid the need for maintaining special system account credentials in each forest. For more information about cross-forest availability lookups, see “Configure the Availability Service for Cross-Forest Topologies” at https://technet.microsoft.com/en-us/library/bb125182.aspx.
Best Practices for Client Access Servers
During the implementation of the new Client Access server infrastructure, Microsoft IT found that moving to a hardware load-balancing mechanism reduced the complexity and improved the load distribution across all of the Client Access servers. This change, in conjunction with the following best practices, provided for a successful implementation:
Secure channel planningMoving to RPC Client Access Service means that the number of secure channels required for authentication by the Client Access server increases. As the number of client connections increases, an organization must monitor and plan for the number of secure channels.
Measurement for server purchasesPurchasing the correct number of Client Access servers can be a difficult task. The Exchange Server product group has provided specific guidance on planning for this number. However, an organization must measure internal usage of Exchange ActiveSync, Outlook Anywhere, and other technologies to determine the proper number of servers.
Deployment cutoversThe procedure described previously has proven to be the most straightforward method for cutting over Client Access server functions without negatively affecting users and functionality.
Microsoft IT has provided users with unified messaging capabilities for many years, starting with Exchange 2000 Server. The Microsoft voice-mail infrastructure before Exchange Server 2007 relied on various private branch exchange (PBX) telephony devices and a third-party unified messaging product that communicated with Exchange servers to deliver voice mail to users’ inboxes. The third-party product required a direct physical connection to the PBX device for each location that provided unified messaging services. This requirement meant that Microsoft IT placed third-party Unified Messaging servers in the same physical location as the PBX.
Starting with Exchange Server 2007, Microsoft IT took the opportunity to redesign the environment and prepare the infrastructure for VoIP telephony across the entire company. With Exchange Server 2010, Microsoft IT needed to perform a migration that took into account an updated design, Microsoft Office Communications Server 2007 dial-plan re-creation, language pack distribution/grammar generation, and voice-mail preview.
For this migration of unified messaging services worldwide, Microsoft IT defined the following requirements:
Provide high-quality, next-generation VoIP services with clear voice-mail playback and voice conversations at all locations.
Provide additional language packs and the opportunity for end users to help protect their voice mail.
Increase security through encrypted VoIP communication.
Maintain a unified messaging environment that allowed for the ongoing rollout of Office Communications Server 2007 R2 to continue.
Ensure smooth migration between third-party unified messaging systems and Exchange Server 2010-based unified messaging.
Reduce administrative overhead by educating and enabling users for self-service.
Unified Messaging Topology
For the Exchange Server 2010 Unified Messaging implementation, Microsoft IT took advantage of the topology that it established for Exchange Server 2007 Unified Messaging. Microsoft IT needed to make several critical decisions beyond the basic conversion that affected the technologies used but did not severely affect the topology of the system. To maintain a centralized environment, Microsoft IT chose to deploy Unified Messaging servers in the four regional data centers that already housed Mailbox and Hub Transport servers. This deployment provided the following benefits:
Reduced server footprint and costs Unified Messaging servers must reside in the same Active Directory sites as Hub Transport and Mailbox servers. Deploying Unified Messaging servers in a decentralized way would require a decentralization of the entire messaging environment, with associated higher server footprint and operational costs.
Optimal preparation for future communication needs Centralizing Unified Messaging servers in four locations makes deployment of technological advancements in VoIP technology a straightforward process. It is also easier to integrate and maintain centralized Unified Messaging servers in a global unified communications infrastructure.
Figure 12 illustrates the Microsoft IT Unified Messaging server deployment in the corporate production environment.
Figure 12. Unified Messaging server topology
After it chose to deploy Unified Messaging servers in the four regional data centers, Microsoft IT faced the goal of providing high quality of voice for all unified messaging users. To provide a high quality of voice data, Microsoft IT distributed Unified Messaging servers according to the number of users that each region supports. For example, for Australia and Asia, Microsoft IT determined that the two Unified Messaging servers deployed with Exchange Server 2007 were not adequate to handle the increased workload of voice-mail preview. This meant deploying a third Unified Messaging server.
The connectivity requirements to PBXs at Microsoft locations vary according to the call load. Microsoft IT deployed these connections years ago as part of a voice-mail solution. Existing PBXs have T1 Primary Rate Interface (PRI) or Basic Rate Interface (BRI) trunks grouped logically as a digital set emulation group. The T1 trunks can use channel-associated signaling where signaling data is on each channel (24 channels for T1), or Q.SIG where there are 23 channels and a dedicated channel for signaling.
The VoIP gateway decisions depend on the type of telephony connection. VoIP gateways support specific signaling types and trunk sizes. Microsoft IT considers the signaling types and the size of the trunk, and then ensures that the combination meets the user load. From monitoring performance, Microsoft IT concluded that the existing connectivity more than met call load and did not require expansion.
Unified Messaging Redundancy and Load Balancing
With Exchange Server 2007, Microsoft IT sought to increase the redundancy and load-balancing capabilities of the unified messaging environment. As Microsoft IT moved the environment to Exchange Server 2010, it examined the redundancy and load-balancing methods and determined that these methods continued to be sufficient.
Figure 13 outlines the method for providing redundancy and load balancing in Exchange Server 2010 Unified Messaging.
Figure 13. Configuration for redundancy of unified messaging
Microsoft IT decided that the minimum level of scalability and flexibility in the unified messaging environment required at least two VoIP gateways communicating with at least two Unified Messaging server partners. Microsoft IT based this decision on a few considerations. First, using two VoIP gateways and two Unified Messaging servers helps ensure that if one telephony link or network link fails at any time, users can still receive unified messaging services. Second, if one VoIP gateway fails, requires configuration changes, or requires updated firmware, Microsoft IT can temporarily switch all traffic to the other gateway. Third, two or more Unified Messaging servers help ensure that if one server fails, the other server can take over. Microsoft IT considered redundancy for PBXs and, based on previous experience, decided that those PBXs provided stable service with built-in redundancy through multiple telephony interface cards and multiple incoming telephony links to the telephone company.
After deciding to use a minimum of two VoIP gateway devices by using two Unified Messaging servers as communication partners, Microsoft IT considered the type and capacity of VoIP gateway to use. Microsoft IT followed several technical and business requirements for making VoIP gateway selections for each location, as follows:
Connectivity type For Microsoft IT, the connectivity type came down to two choices of digital connections: PRI T1 or BRI emulated as a digital set. Microsoft IT eliminated analog connections immediately because of cost and scalability factors. For small sites, Microsoft IT uses BRI (PIMG80PBXDNI); for large sites, Microsoft IT uses either TIMG300DT or TIMG600DT. Whereas TIMG300DT supports a single T1 for each device, TIMG600DT supports dual T1s. Microsoft IT varied the number of T1s depending on usage, employing dual T1s in Redmond. Other sites used BRI trunks emulated as a digital set, with either 8 or 16 lines per gateway, depending on user load.
Simplified Message Desk Interface (SMDI)/signaling integration Intel gateways provide a standard, supported SMDI integration, which is a decision factor for Microsoft IT. To accomplish SMDI integration with Intel gateways, Microsoft IT connected multiple gateways to the same SMDI link by using two primary gateways and multiple secondary gateways. By doing this, Microsoft IT can switch over from one primary gateway to another, enabling gateway firmware updates with no service interruption.
Increased Unified Messaging Security
Many security concerns are associated with a unified messaging environment. For example, Session Initiation Protocol (SIP) proxy impersonation, network sniffing, session hijacking, and even unauthorized phone calls can compromise network security. Microsoft IT can choose from several methods to help secure the unified messaging environment, especially Unified Messaging servers and traffic between VoIP gateways and Unified Messaging servers. These methods include the following:
Secure protocols In the unified messaging environment, all traffic that uses SIP can use Mutual Transport Layer Security (MTLS). This includes the traffic between VoIP gateway devices and the Unified Messaging servers.
Trusted local area networks (LANs) To help prevent network sniffing and reduce overall security risks, Microsoft IT places VoIP gateways on a virtual LAN (VLAN) separate from the corporate production environment. This makes traffic access possible only for authorized individuals who have physical access to VoIP gateways. Moreover, Unified Messaging servers communicate only with gateways explicitly listed in the dial plan.
In addition to these security measures, Microsoft IT enforces general security practices such as using strong authentication methods and strong passwords.
Best Practices for Unified Messaging
Exchange Server 2010 Unified Messaging servers include various configuration options, such as dial plans, VoIP gateway communication partners, hunt groups, and voice-mail protection. Some configuration options represent default configurations or require entering the necessary values, such as the IP addresses for VoIP gateways. For other options, such as dial plans and hunt groups, Microsoft IT considered the feature set necessary to meet business requirements and configured settings accordingly.
The major considerations that Microsoft IT took into account during the migration included:
Voice-mail preview Exchange Server 2010 includes a new feature that provides transcription of voice-mail messages and includes the transcribed text in the e-mail message along with the audio recording of the voice mail. Enabling this new feature was a large benefit to user productivity. However, implementing this feature required additional processing power on all of the servers that held the Unified Messaging server role.
Language packs Exchange Server 2010 includes 10 more language packs than what Exchange Server 2007 offered. Microsoft IT attempted to deploy the additional language packs during the Exchange Server 2007 deployment and found that the amount of processing time on the domain controllers to process the grammar was more than its AD DS system could withstand. This limitation was largely due to the unusually large number of objects in the Microsoft AD DS environment. With this in mind, Microsoft IT rolled out only 16 of the 26 language packs and continues to investigate the best option for taking advantage of the feature.
Office Communications Server dial plansMicrosoft IT began implementing Office Communications Server 2007 after the deployment of Exchange Server 2007. Office Communications Server 2007 is the primary VoIP system for a subset of users in a subset of locations. Although the deployment continues to progress across the company, Microsoft IT needed to take the existing Office Communications Server 2007 dial plans into account when performing the migration. Microsoft IT could not simply migrate these dial plans because Office Communications Server 2007 does not support redirection. This improvement required Microsoft IT to create new dial plans for the Office Communications Server 2007-enabled users. This process required careful planning to provide enough time to work with the Office Communications Server team during the migration to acquire new numbers and build the dial plans.
Voice-mail redirectionThe voice-mail redirection feature enables the VoIP gateway to send all voice-mail requests to Exchange Server 2010, where the feature will determine whether the mailbox is on an Exchange Server 2010 system or on an Exchange Server 2007 system. If it is on an Exchange Server 2010 system, the voice mail will be recorded as normal. If it is on an Exchange Server 2007 system, Exchange will redirect the message back to the gateway and include in the SIP packet the FQDN of the Exchange Server where the voice mail should be delivered. To support the new voice-mail redirection, Microsoft IT needed to update the firmware on most of the VoIP gateways in use and in some cases reconfigure the gateways to point to an internal DNS server.
Internet Mail Connectivity
Microsoft IT capitalizes on native Exchange Server 2010 anti-spam and antivirus capabilities along with the rest of the Microsoft protection tools to help protect the company’s Internet mail connectivity points against spammers and attacks at the messaging layer. Specifically, Microsoft IT moved away from Exchange Server 2007 Edge Transport servers and Forefront Security for Exchange Server in the perimeter network in favor of Exchange Server 2010 Hub Servers acting as a gateway and Forefront Online Protection for Exchange providing the Internet-facing service to help protect the corporate network from outside threats.
Microsoft IT defined the following goals for the design of Internet mail connectivity:
Increase security by using Microsoft Forefront Protection 2010 for Exchange Server in combination with Forefront Online Protection for Exchange.
Adopt robust spam-filtering and virus-filtering methods by using the Forefront Online Protection for Exchange environment.
Adopt Forefront Protection 2010 for Exchange Server running on the Hub Transport role to filter virus messages by using more than one virus engine.
Develop a fault-tolerant system that balances both incoming and outgoing message traffic.
Incoming and Outgoing Message Transfer
In the Exchange Server 2003 and Exchange Server 2007 environment, Microsoft IT used a total of six Internet mail gateway servers in Redmond and Silicon Valley as the main points of contact for incoming and outgoing Internet message transfer and four additional outgoing-only Internet mail gateway servers in Dublin and Singapore. Concentrating the incoming Internet message traffic through the six Internet mail gateway servers in Redmond and Silicon Valley enabled Microsoft IT to limit internal resource exposure, concentrate spam filtering, and centralize security administration. Maintaining four additional outgoing-only Internet mail gateway servers in Dublin and Singapore eliminated the need to transfer messages to Internet recipients from these regions across the Microsoft WAN to an Internet mail gateway server in Redmond or Silicon Valley. To provide good performance and redundancy at each data center, Microsoft IT decided to use three Internet mail gateway servers in Redmond, three in Silicon Valley, two in Dublin, and two in Singapore.
During the migration to Exchange Server 2010, Microsoft IT updated the design and moved the incoming/outgoing e-mail functionality to reside only in the Redmond site. Additionally, Microsoft IT removed the Exchange Edge servers and migrated the outgoing connectors to the Hub Transport servers in the respective sites. This enables greater scalability and reduces management overhead while still maintaining rigorous security due to using Forefront Online Protection for Exchange as the Internet-facing presence. The benefits from this model include:
The outgoing relays are members of the local Active Directory site, allowing for easier administration.
The additional overhead of managing another server type was removed from the environment.
Security is still maintained because the Forefront Online Protection for Exchange layer acts as the Internet presence.
Figure 14 shows the Internet mail connectivity and topology with Exchange Server 2010.
Figure 14. Internet mail connectivity topology
Redundancy and Load Balancing
Because Microsoft IT must meet stringent performance and availability SLAs, balancing the traffic load and providing redundancy is a vital consideration for Internet mail connectivity. Internally, Microsoft IT uses multiple Hub Transport servers in the regions.
Externally, for incoming message transfer from the Internet, Microsoft IT points its mail exchanger (MX) records to the Forefront Online Protection for Exchange. The Forefront Online Protection for Exchange infrastructure provides the initial tier of e-mail spam filtering. From the Forefront Online Protection for Exchange environment, each accepted domain is entered and configured to have mail routed to a hardware load balancer that sits in front of the Hub Transport servers. This method enables Microsoft IT to take advantage of the Forefront Online Protection for Exchange infrastructure to help ensure high availability of the incoming/outgoing messaging platform.
In addition to redundancy of the transport servers themselves, Exchange Server 2010 introduces a shadow redundancy feature to provide resiliency for messages for the entire time that they are in transit. The solution involves a technique similar to the Transport dumpster. With shadow redundancy, the deletion of a message from the transport databases is delayed until the transport server verifies that all of the next hops for that message have completed delivery. If any of the next hops fail before reporting successful delivery, the message is resubmitted for delivery to that next hop.
Note: In addition to DNS host (a) MX records for round-robin load balancing, Microsoft IT maintains Sender ID (Sender Policy Framework, or SPF) records. The Sender ID framework relies on SPF records to identify messaging hosts that are authorized to send messages for a specific SMTP domain, such as microsoft.com. Internet mail hosts that receive messages from microsoft.com can look up the SPF records for the domain to determine whether the sending host is authorized to send mail for Microsoft users. To enable this feature while using Forefront Online Protection for Exchange, Microsoft has pointed its MX records at mail.messaging.microsoft.com and modified its SPF records to include " v=spf1 include: spf.messaging.microsoft.com –all". This configuration ensures that incoming e-mail goes to Forefront Online Protection for Exchange and microsoft.com approves outgoing mail from Forefront Online Protection for Exchange.
Increased Perimeter Network Security
Microsoft IT took advantage of the Forefront Online Protection for Exchange infrastructure to increase security of the messaging environment. Using Forefront Online Protection for Exchange enabled Microsoft to move away from the Exchange Server 2007 implementation of an Edge Transport server infrastructure and focus on an environment where internal messaging resources are not exposed to the public Internet. This change, in combination with Forefront Security for Exchange Server running on the Hub servers, helps ensure that e-mail is filtered and free of viruses before it is delivered to mailboxes.
Note: The security of the Forefront Online Protection for Exchange environment is maintained separately by the Forefront Online Protection for Exchange team and is not covered in this paper. More information is available on the Microsoft Online Services Web site at https://www.microsoft.com/online/exchange-hosted-services/filtering.mspx.
Moving the servers that relay to the Forefront Online Protection for Exchange and eventually to the Internet to inside the network also removed the need to have a large number of server hardening methodologies applied to the servers. During the Exchange Server 2007 time frame, methodologies like separate network interface cards (NICs) and port lockdowns were required because the Exchange Server 2007 Edge servers were in the perimeter network.
With Exchange Server 2010 and the decommissioning of the Edge Transport role, Microsoft IT reconfigured the requirements for hardening to match the requirements for every other server inside the network, as follows. The hardening standards for servers inside the network are significant, but the requirements for inside the network are different from the requirements for outside the network.
Services Microsoft IT uses the Security Configuration Wizard to analyze the unnecessary services to disable.
File shares Microsoft IT removed the Everyone group from all shared folders. All shares must have the security groups applied that contain only the users who need access to the shares. Microsoft IT does not apply open security groups to shares, such as Authenticated Users, Domain Users, or Everyone.
Security updates Microsoft IT monitors the installation of security updates and security configurations on server platforms by using Microsoft Baseline Security Analyzer (MBSA). Hub Transport servers must have all current security updates to help ensure security.
Optimization of Spam and Virus Scanning
Using Forefront Online Protection for Exchange enables Microsoft IT to use the most robust spam and virus protection in an environment outside the corporate infrastructure. Microsoft IT configured the Forefront Online Protection for Exchange environment to help protect mail traffic such that 80–90 percent of incoming traffic is caught as spam. This configuration includes filtering messages in the following order:
Connection filtering is configured to stop known malicious IP addresses from sending malicious messages into the system. This helps ensure that only messages that are likely to contain valuable content reach the scanning system.
Directory lookups are configured by synchronizing the Microsoft directory system with the Forefront Online Protection for Exchange environment. After a message has passed the connection filter, it is qualified against the e-mail addresses that are valid within the organization.
The filter for file name extensions evaluates each message that passes through the first two filters and ensures that if the message contains an attachment, it is not one of the 50 file attachments that Microsoft does not accept.
Virus scanning uses all five scan engines and is configured with maximum certainty. This check blocks only messages that are highly likely to contain a virus.
After a message passes the Forefront Online Protection for Exchange checks, it moves to the internal Hub Transport servers. Microsoft IT considered the available options and implemented Forefront Server for Exchange to provide a second layer of transport protection. After initial testing of both Forefront Online Protection for Exchange and Forefront Server for Exchange, Microsoft IT decided to use the spam protection in Forefront Online Protection for Exchange and the virus protection in Forefront Server for Exchange.
Security and Optimization of Message Transfers
Hub Transport servers communicate with the Forefront Online Protection for Exchange environment through SMTP send and receive connectors. These connectors offer built-in protection, including a header firewall, connection tarpitting, and SMTP backpressure. The connectors in use on the Hub servers are:
Internet incoming mail For incoming mail from the Internet to the internal Exchange Server organization, Microsoft IT configured the Forefront Online Protection for Exchange environment to send messages to redundant hardware load balancers that sit in front of the Hub Transport servers in Redmond. The Hub Transport servers are configured with one receive connector that accepts anonymous SMTP connections only from specific IP addresses.
Cross-forest incoming mail For incoming mail from the separate forests such as the Dogfood forest, Microsoft IT configured one receive connector that accepts anonymous SMTP connections as trusted from the Hub Transport servers in those Exchange organizations.
All other incoming mail For the remaining incoming e-mail, such as e-mail from clients or other applications, Microsoft IT configured one receive connector to accept authenticated SMTP connections. This connector is open to the IP range of the company internally and is the default connection mechanism for SMTP relay.
Outgoing mail For outgoing mail destined for Internet hosts, Microsoft IT configured two send connectors that use the Redmond Hub Transport servers for relaying outgoing messages to Internet hosts. The first connector connects to partner domains by using TLS to ensure an encrypted connection. If TLS connections are unavailable, this connector will not send the message. The second connector uses opportunistic TLS to send outgoing messages and will resort to non-TLS connections to help ensure that messages are sent if TLS is unavailable.
Best Practices for Internet Mail Connectivity
During the implementation of incoming message flow, Microsoft IT found that moving to a system where connection protection was the primary option and where messages were blocked farther away from the internal messaging system was the most efficient choice. With this in mind, Microsoft IT discovered the following best practices:
Receive connectors These should be built to allow authenticated connections by default. Exceptions to this rule—when exceptions are the only option—should be handled on a case-by-case basis and secured by IP scoping or some other means of protection.
Incoming e-mail This should have a well-defined traffic flow. This traffic flow should include a means to exclude the most messages possible in the earliest stage (or the farthest away from the Hub Transport server).
User education within Microsoft is an important component to every implementation. Through the process of evaluating use of software like Exchange Server to include better improvements in future versions, Microsoft IT found that a significant portion of end users simply did not know about features that were available to them. The lack of information for employees negates the cost savings that new features make available to the company. Both end-user productivity and Helpdesk volume correlate directly to user education. For these reasons, Microsoft IT takes user education seriously for every project that it implements.
The team responsible for user education was involved with the planning of the Exchange Server 2010 implementation from the beginning. Initially, the user-education team built a Microsoft SharePoint® site where information such as FAQ and announcements could be posted, and interaction with the employees could occur through feedback forms and user forums. The Web site content was intended for those who wanted a lot of information or were confused by the generalizations made in other forms of communications. The other forms of communication included e-mail messages to all employees, e-mail messages to application owners who connected to Exchange Server, and announcements on the company’s internal portal.
Engaging with application owners early is an important part of the successful migration of existing applications from Web-based Distributed Authoring and Versioning (WebDAV) to Exchange Web Services. The user-education team began communications with the application owners several months before the pilot in production and offered support and deep technical assistance in the support forums on the SharePoint site. During the migration itself, the user-education team handled communications regarding timelines, coordination of test accounts, and eventual coordination of migrating the applications.
Educating end users is equally important and requires careful planning to ensure that the right information is delivered to end users in a relevant time frame. The user-education team sent three e-mail messages to end users. The first message described the project, the timeline, and the new features that would be available (such as Exchange Control Panel). This message also contained links and references to the SharePoint site where more information was available.
The user-education team sent the second e-mail message directly to users before the migration. This message contained more specifics about the features that would be available and what to expect during the migration. This message included details about how access to e-mail would not be interrupted during the process thanks to Online Mailbox Moves, yet users would get a pop-up message at the end of the migration, at which point they should close and re-open Outlook. This level of instruction and explanation likely prevented a significant number of Helpdesk calls regarding what was going to occur during the migration.
The third and final e-mail message contained a welcome to Exchange Server 2010 along with instructions for reviewing the new features (to increase user productivity). It also reinforced the non-human and human support options (to reduce the overall cost of the implementation).
The thorough communication effort resulted in fewer Helpdesk calls than previous implementations of Exchange Server. Despite the number of new features that were implemented (such as MailTips, Exchange Control Panel, and Message Tagging), calls related to assistance with new features did not increase.
Deployment planning is a critical element of the Microsoft IT planning and design process. It addresses the question of how to implement the new messaging environment with minimal interference on existing business processes and provides all members of the Exchange Messaging team with a clear understanding of when to perform the required deployment steps.
The high-level deployment plan that the Messaging Engineering team recommended for the transition to Exchange Server 2010 in the corporate production environment included the following phases:
Introduce Exchange Server 2010 into the corporate production environment.
Verify the successful integration of Exchange Server 2010.
Fully deploy Client Access servers in North America.
Fully deploy Hub Transport servers in North America.
Deploy Mailbox servers in North America.
Transition Internet e-mail.
Deploy Exchange Server 2010 in regional data centers.
Note: In addition to business and technical requirements, the Messaging Engineering team had to consider several unique software issues because the designs were based on beta versions of Exchange Server 2010. Important features, such as the new versions of OAB and messaging records management (MRM), in addition to the related software products, such as Forefront Security for Exchange Server, were not yet ready for deployment when Microsoft IT started the production rollout. The Microsoft IT deployment plans reflect these dependencies, which became obsolete with the release of Exchange Server 2010.
Introducing Exchange Server 2010 into the Corporate Production Environment
Microsoft IT introduced Exchange Server 2010 into the Corporate forest in February 2007. This work included the preparation of AD DS, the implementation of the administrative model, and the installation of the first Hub Transport and Client Access servers, as required to integrate Exchange Server 2010 with an existing Exchange Server 2007 environment.
Verifying the Successful Integration of Exchange Server 2010
This initial phase also included the installation of the first set of Mailbox servers in a DAG for approximately 5,000 production mailboxes. Moving 5,000 power users to Exchange Server 2010 enabled Microsoft IT to verify the successful integration of Exchange Server 2010 into the existing messaging environment. Although this first set of migrated mailboxes consisted of people who opted in to the process, a second set of 10,000 mailboxes were quickly migrated to the system. This second set of 10,000 mailboxes did not have an option to opt out and consisted of employees from many different groups and organizational levels. This commitment to providing an early implementation of Exchange Server 2010 as a true business service ensured a high level of responsiveness and thought during implementation for both the product group and Microsoft IT.
Fully Deploying Client Access Servers in North America
With the availability of Exchange Server 2010 Beta, Microsoft IT began the deployment of Client Access servers at full scale in North America. This included building and testing the Client Access server farm and switching the mail.microsoft.com URL namespace from the Exchange 2007 Client Access servers to the Exchange 2010 Client Access servers. Due to the requirements of the namespace transition, moving the mail.microsoft.com URL to the Exchange Server 2010 infrastructure left a requirement to have Exchange Server 2007 on its own namespace. Microsoft IT selected owa.microsoft.com for this purpose.
Figure 15 illustrates how Client Access servers support users with mailboxes on Exchange Server 2010 and Exchange Server 2007.
Figure 15. Coexistence of Exchange Server 2010 and Exchange Server 2007 Mailboxes
As described earlier, redirection is a large portion of the Client Access server strategy for Microsoft IT. To fully make the transition from Exchange Server 2007 to Exchange Server 2010, Microsoft IT gave special consideration to the redirection methodology for Exchange Server 2010. Redirection in Exchange Server 2010 back to Exchange Server 2007 is different from how Exchange Server 2007 handled communication with Exchange Server 2003. This is due to the removal of WebDAV support and the associated earlier virtual directories (that is, /public/* and /exchange/*).
To make the transition, Microsoft IT first built the Exchange Server 2010 URL namespace, including all features (Outlook Anywhere, Autodiscover, Outlook Web App, Exchange Control Panel, Exchange ActiveSync, OAB, Exchange Web Services, and Unified Messaging). The team configured this namespace in the Exchange servers at the desired namespace, mail.microsoft.com. To prevent downtime, the team did not yet update the DNS records. Next, the team updated the URLs on the Exchange Server 2007 infrastructure during a scheduled service outage to a namespace that replaced “mail” with “owa.” This step moved all earlier systems in the Redmond location from mail.microsoft.com to owa.microsoft.com and all earlier systems in the regional locations from mail.<region>.microsoft.com to owa.<region>.microsoft.com. At this point, all of the URLs within the Exchange Server 2010 environment and the Exchange Server 2007 environment were updated to reflect their end state, and all that remained was to update the DNS entries to reflect the same.
These steps are necessary to complete the move from Exchange Server 2007 to Exchange Server 2010 as the public-facing version of Exchange Server. It is important to understand that Exchange Server 2010 Client Access server does not support rendering mailbox data from earlier versions of Exchange.
For Outlook Web App, Exchange 2010 Client Access server follows one of four scenarios depending on the target mailbox's version and/or location:
If the Exchange Server 2007 mailbox is in the same Active Directory site as Exchange Server 2010 Client Access server, Exchange Server 2010 Client Access server will silently redirect the session to the Exchange Server 2007 Client Access server.
If the Exchange Server 2007 mailbox is in another Internet-facing Active Directory site, Exchange Server 2010 Client Access server will manually redirect the user to the Exchange Server 2007 Client Access server.
If the Exchange Server 2007 mailbox is in an Active Directory site that is not Internet facing, Exchange Server 2010 Client Access server will proxy the connection to the Exchange Server 2007 Client Access server.
If the mailbox is Exchange Server 2003, Exchange Server 2010 Client Access server will silently redirect the session to a pre-defined URL.
For Exchange ActiveSync, Exchange Server 2010 Client Access server does not support rendering mailbox data from earlier versions of Exchange Server. Exchange Server 2010 Client Access server follows one of four scenarios depending on the target mailbox's version and/or location, and device capabilities:
If the Exchange Server 2007 mailbox is in the same Active Directory site as Exchange Server 2010 Client Access server and the device supports Autodiscover, Exchange Server 2010 Client Access server will notify the device to synchronize with Exchange Server 2007 Client Access server.
If the Exchange 2007 mailbox is in the same Active Directory site as Exchange Server 2010 Client Access server and the device does not support Autodiscover, Exchange Server 2010 Client Access server will proxy the connection to Exchange Server 2007 Client Access server.
If the Exchange 2007 mailbox is in an Active Directory site that is not Internet facing, Exchange Server 2010 Client Access server will proxy the connection to the Exchange Server 2007 Client Access server.
If the mailbox is Exchange Server 2003, Exchange Server 2010 Client Access server will proxy the connection to the Exchange Server 2003 Mailbox server.
For Outlook Anywhere, Exchange Server 2010 Client Access server will always proxy the Outlook MAPI RPC data that is embedded in the RPC-HTTPS packet to the target earlier Mailbox server (regardless of Active Directory site or version) or to the appropriate Exchange Server 2010 Client Access server.
After the Client Access servers were fully deployed, Microsoft IT considered the task of enabling some of the additional features, such as the ability to update the version of Microsoft Office Outlook Mobile on Windows Mobile® 6.1 phones.
Fully Deploying Hub Transport Servers in North America
Before deploying Mailbox servers in large numbers, Microsoft IT deployed Hub Transport servers in all of the sites that already contained an Exchange Server 2007 Hub Transport server to provide the necessary redundancy and scalability for the message-routing functions within the Exchange Server 2010 environment. All messages sent between Mailbox servers must pass through a Hub Transport server. During this transition, the main planning goal was to ensure that no message was returned with a non-delivery report (NDR).
After Microsoft IT decided to move to a model where spam scanning was handled in the Forefront Online Protection for Exchange environment, it was crucial for Microsoft IT to maintain virus-scanning on the internal Hub Transport servers. Accordingly, Microsoft IT began to test Forefront Security for Exchange Server as soon as the software was available in a stable version. The criteria that Microsoft IT defined for the deployment in the corporate production environment included the following points:
The antivirus solution works reliably (that is, without excessive failures) and with acceptable throughput according to the expected message volumes.
The software can find known viruses in various message encodings, formats, and attachments.
If the software fails, message transport must halt so that messages do not pass the Hub Transport servers until they are scanned.
After Microsoft IT completed these tests successfully, Microsoft IT deployed Forefront Security for Exchange Server. The Hub Transport servers then superseded the earlier Hub Transport servers running Exchange Server 2007. After Microsoft IT completed the transition of mail routing, it decommissioned the Exchange Server 2007 Hub Transport servers.
Deploying Mailbox Servers in North America
With the Client Access server and Hub Transport server topologies in place, Microsoft IT began to deploy additional Mailbox servers to move more mailboxes to Exchange Server 2010. The rate of mailbox moves to Exchange Server 2010 far surpassed the rate of movement to any previous version or beta version of Exchange Server because of the Online Mailbox Moves process. The Online Mailbox Moves process enabled Microsoft IT to move mailboxes all day instead of scheduling the moves for off hours. In the early beta phases, Microsoft IT moved roughly 200 to 300 mailboxes per day. With the increased load, the servers continued to perform well, and by Beta 2, Microsoft IT moved 1,500 to 2,000 mailboxes per day. This was a significant achievement for both the product group and Microsoft IT. At peak rates, Microsoft IT moved more than 4,000 mailboxes per day.
One of the key factors that Microsoft IT discovered was that the item count in the mailbox affected the data transfer rates for the mailbox moves. On average, Microsoft IT achieved transfer rates of 5 GB to 7 GB per hour on single mailbox moves and transfer rates of around 20 GB per hour when moving multiple mailboxes at a time, as shown in Table 3.
Table 3. Observed Throughput of Online Mailbox Moves
Items per minute
Items per hour
Transitioning Internet e-mail
With the transition to Forefront Online Protection for Exchange, the transition of incoming Internet e-mail is a simple configuration change on the Forefront Online Protection for Exchange administration page. The transition of outgoing Internet e-mail is nearly as simple, with an adjustment to the send connectors in Exchange Server 2010.
Deploying Exchange Server 2010 in Regional Data Centers
With Hub Transport servers in North America reliably performing the core of the message-routing functions and virus scanning, Microsoft IT was ready to approach the deployment of Hub Transport servers, Client Access servers, and Mailbox servers in the regional data centers. In general, the deployment processes followed the approach that Microsoft IT had successfully used in North America. Microsoft IT established Hub Transport servers in each region, switched the regional URL namespaces to Client Access servers, began to move mailboxes to Exchange Server 2010, and transitioned the outgoing Internet mail connectors to use the outgoing Hub Transport servers in Redmond.
The final switch to Exchange Server 2010 in the messaging backbone did not require many further changes. One important step was to optimize the message-routing topology by using Exchange Server-specific Active Directory site connectors in order to implement a hub/spoke topology along the physical network links.
While designing a messaging environment with multiple worldwide data centers by using both IP and telephony technologies, Microsoft IT employed design phases during which engineering teams analyzed the existing environment, considered the possible decisions, and arrived at a fitting design for the Exchange Server 2010-based messaging production environment. During the process of considering business needs and choosing the features to address business requirements, Microsoft IT developed the following best practices that supported a solid design and ensured a smooth transition to Exchange Server 2010.
Planning and Design Best Practices
Microsoft IT relied on the following best practices in its planning and design activities:
Clearly define goals Exchange Server 2010 includes roles and configuration options that enable numerous topology and design scenarios. The mix of server roles, enabled options, and settings depends on the business needs and messaging goals of the organization.
Design for production in mind To meet business requirements, Microsoft IT checks design considerations against practical real-world constraints that exist in the production environment. This helps produce a smooth transition to the new environment after implementation of the design.
Design for peak load days Microsoft IT uses the concept of peak load days, or snow days, to plan for the event when a large number of people use the messaging infrastructure from outside the corporate network. The messaging design takes into account the possibility of some days when the majority of users work from home or remotely.
Test in lab environmentWith the many options to meet business requirements, Microsoft IT validates the chosen designs in a test environment. This enables Microsoft IT to determine stability and finish design plans before rolling out a planned infrastructure in the production environment.
Identify key risks Microsoft IT practices sound project management practices as part of the MSF processes. These practices include identifying risks present with design decisions as well as overall system risks. By identifying risks early on, Microsoft IT can develop mitigation strategies to address the risks.
Develop rollback and mitigation procedures It is important for Microsoft IT to have rollback procedures when designing the Exchange Server 2010-based messaging environment. Microsoft IT accomplished this through a period of coexistence where both Exchange Server 2007 and Exchange Server 2010 processed mail. Microsoft IT later decommissioned the Exchange Server 2007 server after verifying functionality of the new environment.
Server Design Best Practices
Microsoft IT relied on the following best practices in its server designs:
Use multiple-core processors and design storage based on both capacity and I/O performance During testing, Microsoft IT determined that multi-core processors provide substantial performance benefits. However, processing power is not the only factor that determines Mailbox server performance. It is also important to design the storage subsystem according to I/O load and capacity requirements based on the desired number of users per Mailbox server.
Build quickly deployable systems Automation and planning around where the messaging systems reside in the network are key factors in ensuring that systems can be easily changed on an ongoing basis. By using the scripting capabilities in the Exchange Management Shell and Group Policy, Microsoft IT built a system that can scale up and down as business requirements change over time.
Eliminate single points of failure When designing an Exchange environment, Microsoft IT needed to create redundancy at all points possible. Microsoft IT relied on multiple data centers, multi-homed NICs, hardware load balancers, redundant Hub Transport servers, multiple VoIP gateways for unified messaging, multiple Client Access servers, Client Access server arrays, and DAGs on Mailbox servers.
Deployment Best Practices
Microsoft IT relied on the following best practices during the transition to Exchange Server 2010:
Establish a flexible and scalable messaging infrastructure Microsoft IT focused on planning where multi-role server and single-role server deployments made sense. Using this combination enabled Microsoft IT to both fine-tune the server designs and place the necessary number of servers to handle the load for certain locations and roles. It also enabled Microsoft IT to reduce the risk that server failures at a smaller site would make the messaging services at that site unavailable.
Carefully plan URL namespaces At Microsoft, Client Access servers handle approximately 150,000 mobile user sessions per month. To distribute this load, Microsoft IT uses multiple URL namespaces, where a URL represents access points for clients in specific geographic regions. Microsoft IT chose to preserve these namespaces to provide a seamless transition for mobile users.
Manage permissions through security groups Deploying Exchange Server 2010 in an existing organization does not affect any existing rights granted on Exchange Server 2007 resources through user accounts or security groups. This provides Microsoft IT with the opportunity to change earlier permission assignments and manage permissions through security groups.
Use the fewest permissions necessary Microsoft IT grants only necessary rights to administrators, opting to grant full Domain or Enterprise Admin rights only when a business or technical reason exists. Even in these cases, Microsoft IT grants the rights temporarily to accomplish a specific task. Managing rights in this way helps maintain network security.
Use Forefront and multiple layers of protection Microsoft IT designed the messaging environment to provide many layers of protection against viruses, spam, and other unwanted e-mail. Microsoft IT deploys Forefront Security for Exchange Server on Hub Transport servers to enable bidirectional scanning of e-mail messages and enforce protection at multiple organizational levels.
Build a scalable mail-routing platform Microsoft IT uses the robust Forefront Online Protection for Exchange infrastructure in addition to two layers of Hub Transport servers inside the network. This configuration blocks spam messages before they enter the network and blocks virus messages before they reach the core Hub Transport system. Removing unnecessary client receive connectors and enabling transport encryption help ensure that the internal connection points to the mail routing platform are just as secure and robust as the external connection points.
Use hardware load balancers to publishClient Access servers Microsoft IT capitalizes on a single load-balancing infrastructure to provide load balancing for both internal and external access to Client Access server resources. This ensures a uniform load across the Client Access servers and a single point of security control.
With the completion of the production rollout at the RTM date of the product, Microsoft IT demonstrated the enterprise readiness of Exchange Server 2010. The Microsoft messaging environment hosts 180,000 mailboxes with a typical quota of 5 GB in four data centers on 10 database availability groups with 99.99+ percent availability, achieving SLA targets. The corporate production environment also includes 72 Client Access servers, 29 Hub Transport servers, and 16 Unified Messaging servers.
The transition to Exchange Server 2010 enabled Microsoft IT to reduce operational costs, including costs for server hardware, storage, and backup. For example, Microsoft IT replaced a cost-efficient direct access storage device (DASD)–based solution with an even more cost-effective JBOD-based storage solution, and eliminated backups.
By increasing mailbox quotas to 5 GB by using thin provisioning of low-cost storage and deploying new productivity features that are readily available in Exchange Server 2010, such as MRM (the new version), litigation hold, and single item recovery, Microsoft IT helped increase the productivity of Microsoft employees. Users can store all messages on the server, including e-mail, voice mail, and fax messages, and they can access these messages from any suitable stationary or portable client, including standard telephones. Outlook 2010 also helps employees to increase productivity. By using Outlook 2010 as the primary messaging client in the corporate production environment, employees can benefit from new and advanced information management features, such as instant search, managed folders, and more.
Increasing user productivity also entails maintaining availability levels for messaging services according to business requirements and SLAs. To accomplish this goal, Microsoft IT heavily focuses on single-role Mailbox server deployments with Exchange Server 2010. Dedicated Mailbox servers support database availability groups. Microsoft IT uses database availability groups to increase resiliency from storage-level failures and single-role and multi-role server deployments in smaller sites to maintain resiliency during unexpected outages.
Another important aspect of the Exchange Server 2010 deployment is messaging protection and security. To achieve the highest security and protection levels while maintaining a flexible environment, Microsoft IT encrypts all server-to-server message traffic by using TLS to help prevent spoofing and help protect confidentiality for messages. Microsoft IT also uses Forefront Protection 2010 for Exchange Server for incoming and outgoing message relay, reducing the number of legitimate messages incorrectly identified as spam. For antivirus protection, Microsoft IT deployed Forefront Security for Exchange Server on all Hub Transport servers.
Exchange Server 2010 enabled Microsoft IT to capitalize on 64-bit technologies and cost-efficient storage solutions to increase the level of messaging services in the corporate environment. The Exchange Messaging team is now in a better position to respond to new messaging trends and accommodate emerging needs as Microsoft continues to grow.
For More Information
For more information about Microsoft products or services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada information Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information through the World Wide Web, go to:
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred.
© 2010 Microsoft Corporation. All rights reserved.
Microsoft, Active Directory, ActiveSync, Forefront, MSN, Outlook, SharePoint, Windows, Windows Mobile, Windows PowerShell, and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.
All other trademarks are property of their respective owners.