This section of the Microsoft® Hotmail® Migration Technical Case Studycovers the overall planning aspects of successfully migrating FreeBSD and Apache server to Windows 2000 and Internet Information Services.
On This Page
Business Justification and Project Constraints
The Pre-Windows 2000 Migration Environment
The Windows 2000 End-State (target) Environment
When planning an undertaking of this magnitude, it is important to understand several key factors; specifically:
Understand the business drivers To effectively manage scope, meet deadlines, remain within budget and still be successful at the undertaking, it is imperative that you understand:
Why the task is being undertaken.
What the critical success factors are.
These two elements will aid in making decisions as the project unfolds:
Understand the project constraints.
Understand and fully document the technical architecture of the existing environment from both an infrastructure and application services perspective.
Understand and compile documentation of the existing operational processes and tools. This aids in assessing the impact of the migration on the operational areas of the organization.
Understand and document the end-state technical and operational environment.
Equipped with this knowledge, you are then ready to commence undertaking the technical aspects of the migration.
Business Justification and Project Constraints
When making decisions relative to migration and consequently what is in/out of scope for a particular phase of the project, it is important to understand the business drivers, priorities, and constraints of the project. Windows 2000 inherently has a myriad of features that can be implemented. The challenge for any organization undertaking a Windows 2000 implementation is to determine what functionality will be implemented first. Typically, this is driven by the business factors, which justified the project.
Additional features can easily be implemented later as applications are completely migrated to Windows 2000 and enhancements to operational procedures and support are made as a result of improved tools in Windows 2000.
The reasons for converting to Windows 2000 were:
Performance (and therefore cost). The cgi "one-process-per-socket" model, under FreeBSD, is very inefficient. Per machine throughput can be dramatically improved by moving to a multi-threaded application model. This results in one of the following conditions:
A fewer number of servers required to support the site
Support for a greater number of users by using the servers already deployed at the site
Globalization and foreign language support. Hotmail had the requirement to launch in new markets, and did not want to continue to invest in keeping the FreeBSD locale tables up to date and other maintenance activities. China and Japan are two important growing markets for MSN, so multibyte character sets had to be supported. FreeBSD lacked the necessary Unicode support.
Shorter development cycles. Better tools for development and debugging would allow for more rapid feature development and more rapid detection of performance bottlenecks in the code.
The following is a list of the operational constraints that shaped the ultimate feature set implemented in phase 1:
Zero user impact or downtime. To maintain the availability of the Hotmail service, a zero user impact or downtime was considered the key requirement of the migration.
Rapid deployment. The migration included nearly 3,800 servers, and was limited to a 4-week conversion cycle. Remain competitive in the Internet space, requires rapid response to competitive situations. This had resulted in new releases of the Hotmail application suite occurring approximately every eight weeks. It was important that the migration to Windows 2000 could be accomplished in Windows so that the development staff would not have to maintain the follow-on release of the application for both FreeBSD and Windows 2000. The same model that was used for the initial deployment will also be used for new software releases, which generally occur once every two months.
Use existing staff. All staff should remain effective regardless of operating system. This includes staff in system development, QA, and Operations.
Use existing site hardware and network. The migration would quickly "reimage" each piece of equipment and return it to service. No physical hardware or network topology changes should be necessary.
Minimize operations impact. Operations should be notified which systems will be migrated, and when those systems are again put into production. Feedback on the success/failure of each migration should be sent to operations at the end of each migration cycle.
Use existing monitoring infrastructure. Operations staff generally monitor trouble and inform technicians when problems occur, and the same monitoring tools should be available for either operating system.
Use/adapt existing operational processes and utilities. Changing the operating system on each server should have zero impact on day-to-day operations. Existing operating procedures should not be changed or modified to accommodate software changes.
Totally remote and unattended. The migration should not require any physical human interaction to migrate the servers. All updates should occur through software scheduling and should report success/failure to a central console so migration can be verified.
Stability and performance. The entire objective of migrating to other platforms is to improve stability and performance and reduce costs.
Continued hybrid operating system environment. To facilitate an incremental migration, the system must fully support multiple operating systems running simultaneously within a cluster. This also mitigates risk in that at any time a server can be "toggled" quickly (75 min.) from one operating system to the other(See footnote F1 at the end of this page).
Given the tight deployment window, coupled with the requirement to use and not increase existing staff, it became imperative that the operational impact be kept to a minimum. A very effective approach to attaining this goal is to use or adapt existing operational processes and tools wherever possible and to automate the deployment of the technology as much as possible. This is the reason that from an operational perspective the migrated environment is managed in a very similar fashion as the pre-migration environment. The idea was to evaluate alternative and enhanced methods for managing the environment after the migration was complete.
An example of how this balance between maximizing the capabilities of the new technology vs. keeping existing operational processes is demonstrated in the following example:
In the FreeBSD\Apache environment, configuration management was performed through a combination of using a standard automated build process coupled with a scripted software deployment process (RDIST for distribution, Perl scripts executed through "cron" jobs for implementation).
The automated build process consisted of making a reference or master server image of the desired server configuration. This image would be deployed to the front-end machines through the remote boot server. So, every machine had an identical image. This is very effective. However, the one downside is that there was not a mechanism to enforce conformance to the master image after it was distributed, that is, operations personnel could change the configuration in the course of troubleshooting a problem (or performing some other operational process), and there was not a mechanism to automatically return the server back to its original state. However, if a server was "misbehaving," you can always re-image the box to get it back to the original state or leverage RDIST or Telnet/script files to reapply a particular configuration. However, this did take intervention from operational personal. In other words, there was not an automated self-enforcing configuration management mechanism once the FreeBSD\Apache server was built.
With Windows 2000, there is the functionality of group policy (GPOs), which provides an automated mechanism for reapplying security settings and software configuration without operations involvement.
However, the pre-migration process was working well and was efficient. So, given the phase 1 project timeline, GPOs, beyond the built-in ones (Domain Controllers and Default Domain Policy) were not leveraged.
Now that the migration is complete, expanded utilization of group policy can be investigated, especially in the area of enforcing security settings.
The Pre-Windows 2000 Migration Environment
As mentioned earlier, the original architects of the Hotmail service created a two-tier architecture built around various UNIX systems. The Hotmail configuration consisted of clusters of servers—each cluster consisting of approximately 300 Web servers and 8 storage machines. Each cluster supported between 2 million and 20 million users. The system grows by adding additional clusters.
All of the Web servers were running FreeBSD with Apache. There were three classes of front-end Web servers to be migrated during phase 1: Login Servers (which process login and other SSL transactions), Outlook Express Distributed Authoring and Versioning (DAV) servers and Web servers that hosted the Hotmail experience, that is, reading, composing, deleting mail, and so on.
There is redundancy built into the Web services through the utilization of network-based load balancing. Essentially, the cluster advertises virtual host names and IP addresses in Domain Name Server (DNS), which map back to real server names and IPs through the load balancer. The virtual host names (and IPs) are given out through round-robin DNS (RRDNS). See the following diagram for further detail.
Hardware SSL accelerators were used to offload the FreeBSD\Apache Servers from processing SSL transactions.
The entire (front-end) Hotmail application was implemented as multiple cgi binaries, each statically linked and compiled with gcc. This created a performance and scalability challenge in that, as a standard, the Common Gateway Interface (CGI) model only supports one newly created process per connection. This limitation requires the addition of more servers, prior to maximizing the use of the server in terms of transactions per second (TPS), to maintain throughput as services grow. This is a costly scalability solution(See footnote F2 at the end of this page).
The following diagram illustrates the server topology for a typical cluster.
Each Web server had its own local FreeBSD administrator account. So, essentially there were literally thousands of individual administrator accounts that required synchronization.
The Windows 2000 End-State (target) Environment
Phase 1 of the Windows 2000 migration project for Hotmail involved migrating all the existing FreeBSD\Apache Web servers to Windows 2000 and Internet Information Services (IIS). When all was said and done, this included migrating approximately 3,600 servers within a two-month period. The process used for the migration was able to scale to as many as 400 servers per 24-hour period. The bottleneck was actually LAN segment bandwidth. If there were a requirement to migrate more servers per period, you can spread the target machines out across more subnets to mitigate the LAN bandwidth issues.
Details of the application port are included in the deploying section.
Microsoft Services for UNIX (SFU) was implemented to provide the Hotmail team with standard UNIX utilities like grep and du.
Microsoft Interix provides an environment for running UNIX-based applications and scripts on Windows 2000. In one sense, Microsoft Interix provides a UNIX subsystem/environment on Windows 2000. The Hotmail team utilized the syslog functionality that Microsoft Interix provided. This was key to minimizing the impact to operations. Thereby, this allowed the replacement of the technology to be largely decoupled from retooling operations and management of the environment. This is not meant to imply that Microsoft Interix will allow all scripts to be reused without modification. As with any application port from one operating system to another, testing is a critical component of the migration. However, it helped to maximize the use of previously developed scripts.
The SSL hardware accelerators were no longer required. It was found that the onboard SSL processing by Microsoft Internet Information Services was more efficient and provided greater throughput than the external hardware solution with FreeBSD.
Windows 2000 Active Directory Service Appendixes 2 and 3 have pictorial representations of the Windows 2000 domain and site topology and a brief explanation as to the rational for the topology. In Phase 1, the use of active directory was limited to:
Centralized account database for storing the system administration and server accounts
Storing Group Policy Objects (GPOs) for configuration management (Built-in GPOs).
Implementation of basic organization units (OU) by administrative group, that is the OUs in the root domain mapped to Hotmail administrative groups.
Implementation of sites by data center.
One important item to note here is that you do not need to construct an elaborate Windows 2000 Active Directory design in order to proceed with a Windows 2000 deployment. The extent, or lack thereof, is dependent upon the business drivers. The key is that the approach should be to keep it simple. Then when business drivers surface that would be accommodated through such things as directory enabling applications or running business logic that queries the Windows 2000 Active Directory store. You can further partition the directory through organizational units or extend the schema if required.
The following diagram depicts the high-level server topology, as it exists at the time of this writing.
Hotmail chose Microsoft interoperability solutions that allowed them to progressively benefit from their staged Windows 2000 migration, while retaining full control of their new, hybrid application. In this way, the project business objectives were attained and a richer development environment was created, while minimizing the impact to operations and the day-to-day running of the business.
F1 - 75 minutes = The time required to re-image a system from when the restart command is sent to the server and it is back online supporting the other operating system. Reference the deployment section for additional detail.
F2 - There obviously are multi-tasking/multi-process solutions that Hotmail could have leveraged under FreeBSD. However, they would require making application modifications and rework to implement. So, this was an optimum opportunity to examine other options and platforms.
Click here to return to the introduction page of the Microsoft Hotmail Migration Technical Case Study.