White Paper: Understanding the Relative Costs of Client Access Server Workloads in Exchange Server 2010

 

Greg Smith, Software Development Engineer in Test, Microsoft Exchange Server

July 2010

Summary

Estimating your Exchange Server 2010 Client Access server capacity needs is a critical setup task. The Client Access server is the entry point for all users. In addition, the Client Access server hosts important services used by the other Exchange server roles. This white paper presents an estimate of the relative CPU weights of the different protocols on the Client Access server that can be used to produce a more detailed estimate of hardware needs when you design a new Exchange 2010 deployment or expand an existing one. As part of the testing performed while researching this white paper, the effect of MailTips and the cost of NTML versus Basic authentication were also compared. In addition to reporting measurements performed on Exchange 2010, this paper also reports on Outlook Web App in Exchange 2010 SP1, which added a feature to improve the user experience that adds costs to the server.

Applies To

Exchange Server 2010

Introduction

The Exchange Scalability and Capacity Planning team investigated the costs associated with common workloads that run against the Client Access server role. This investigation provided additional data points that customers and partners can use in their pre-deployment planning. As always, it's recommended that you validate your conclusions by simulating your expected loads before deployment.

Test Methodology

Testing was performed on a test topology in the Exchange Performance and Scalability lab using the Exchange Load Generator tool. Wherever possible, we kept the user profile for a particular work load the same. The tests targeted our "100 profile", which represents 80 messages received and 20 messages sent in an average day. Although it's conceivable that users of Exchange ActiveSync and Outlook Anywhere might, on average, send more mail than an Outlook user (by using POP3 or IMAP4 and SMTP), those differences are ignored here.

Our reference Client Access server platform has two processor sockets on the motherboard populated with Intel Xeon L5335 four-core processors running at 2.00 gigahertz (GHz) for a total of eight physical cores. Hyper-threading is disabled on this platform to allow for more accurate computation of CPU costs. The platform runs with 16 gigabyte (GB) of RAM. The test topology also contains two servers that have the Exchange Mailbox server role installed. The Mailbox servers use the same processor configuration and are populated with 16 GB and 32 GB of RAM. An additional server is configured with the Hub Transport server role using a single Intel Xeon 5148 four-core processor running at 2.33 GHz and 16 GB of RAM.

User Profiles

The following profiles were used to represent equivalent work across the various client types. We calculated the effective number of 100-profile users for these protocols, where users aren't connected for the entire day, by calculating the number of messages that would be "read" (for POP3 and IMAP4) or "sent" (for Outlook Web App) in 8 hours. We divided that number by the number of messages a 100-profile user would receive per day (80 messages for POP3 and IMAP4, or 20 messages sent per day for Outlook Web App). We then divided by 2 to account for the peak load period of the day.

Summary of profiles used in testing

Client LoadGen 2010 module Profile/script Conversion to 100-profile users

Outlook Anywhere

Outlook 2007 Cached

100 profile (heavy)

RPC/HTTP Proxy\Current number of Unique Users

Outlook

Outlook 2007 Cached

100 profile (heavy)

MSExchange RpcClientAccess\User count

Outlook Web App

OwaModule

OwaHeavyUser.txt

(Messages sent in 8 hrs/20)/2

IMAP4

ImapModule

Imap-oe.txt

(Fetches in 8 hrs/80)/2

POP3

PopModule

POP-10sec.txt

(RETRs in 8 hrs/80)/2

Exchange ActiveSync

ActiveSync

V120DirectPush

MSExchange ActiveSync\Ping Commands Pending

Exchange Web Services (Entourage)

(internal tool)

(internal tool)

MSExchangeWS\Active Subscriptions

Summarized Data

The following table summarizes the effect of a 100-profile (heavy) user for the protocols that were tested. When the CPU-per-user ratio increased with the number of users, we reported the value where our test server would be running at 75 percent total CPU (12 GHz consumed). For IMAP4, where the costs increased with the size of the mailbox, we chose the value for 100-MB mailboxes. Notice that for Outlook Web App, IMAP4, and POP3, where users aren't continuously connected, these values aren't per concurrent users, but per the calculated number of 100-profile users. Also notice that Exchange ActiveSync reports the additional cost of adding this feature to existing Outlook users. There are no Hub Transport server values reported for IMAP4 or POP3 because these users generally use the SMTP protocol to send mail. This has no effect on the Client Access server. Therefore, IMAP4 and POP3 weren't included in the runs.

CPU and network usage for Exchange 2010 server roles and protocols

  Client Access   Hub Transport Mailbox Active Directory  

CPU (MHz/

user)

Network KB/

sec/user

CPU (MHz/

user)

CPU (MHz/

user)

CPU (MHz/

user)

LDAP searches/

user/sec

Exchange ActiveSync (delta)

1.60

1.04

0.22

1.25

0.05

0.024

Exchange Web Services (Entourage)

0.71

0.54

0.18

0.32

0.18

0.037

IMAP4

0.86

0.14

 

0.90

0.03

0.007

Outlook

0.35

0.37

0.24

0.59

0.05

0.014

Outlook Anywhere

0.80

0.44

0.20

0.60

0.06

0.016

Outlook Web App

0.86

0.88

0.25

1.16

0.10

0.025

Outlook Web App in Exchange 2010 SP1

1.17

1.35

0.25

1.19

0.10

0.025

POP3

0.33

0.79

 

0.09

0.05

0.010

There are many sources of data and clients not covered here, including BlackBerry devices and custom applications that use Exchange Web Services. We recommend that customers perform independent testing for these scenarios.

Outlook Anywhere

Outlook Anywhere testing was performed on a 2007 CPU design that uses two L5335 Xeon 4-core processors running at 2 GHz. With the operating system of Windows Server 2008 R2, we project a CPU scaling limit of approximately 15,000 heavy Outlook Anywhere users per Client Access server. We performed our tests using Basic authentication. When we performed the same tests with NTLM authentication, a 2 percent reduction in the number of Outlook Anywhere users per Client Access server was shown.

As the number of users increases, the number of MHz per user increases linearly. However, Windows Server 2008 SP2 shows a significantly steeper slope of increase than Windows Server 2008 R2. The following graph also shows the relative CPU consumption of the Microsoft.ExchangeRPCClientAccess.Service process.

Client Access server CPU consumption per user

Client Access server CPU consumption per user

There are two main components of the cost of supporting Outlook Anywhere users. The first is the cost for the W3wp process, and the second is the cost for the MAPI processes, including the Microsoft.Exchange.RPCClientAccess.Service process. The costs for the MAPI processes were the same for both Windows Server 2008 R2 and Windows Server 2008 SP2 and increased slightly with increasing numbers of users. The costs in the W3wp process also increased with additional users.

Total CPU consumption for the Client Access server

Total CPU consumption for the Client Access server

The previous graph indicates that if up to 75 percent of the total CPU is used, this server could support about 14,000 users on Windows Server 2008 R2.

The following graph shows that, for other server roles, there's no significant increase in cost as the number of users increases.

Total CPU consumption for other server roles

CPU Consumption of Other Roles

Network traffic per user is virtually unchanged at 0.45 KB/sec as the number of users increases.

The Impact of NTLM Authentication

The initial tests were performed using Basic authentication. We then performed the same tests using NTLM authentication to compare it with Basic authentication. We found an increase in CPU consumption. When the tests were projected out to 75 percent-total CPU (12 GHz), this results in a 2-percent reduction in users supported.

Differences between NTLM and Basic authentication

NTLM vs Basic

We recommend that you deploy on Windows Server 2008 R2 if you plan to support Outlook Anywhere users.

Cached Mode Outlook Testing

In Exchange 2010, the Client Access server role now supports direct connections from Outlook 2007 and Outlook 2010. We performed tests with Outlook 2007 and Outlook 2010 connecting to the Client Access server with cached mode profiles. These tests were performed on Windows Server 2008 R2 with a 2007 CPU design that uses two L5335 Xeon 4-core processors running at 2 GHz. The results of these tests project a CPU-scaling limit of approximately 34,000 Outlook users per Client Access server.

For Outlook, the CPU consumption per user increases as the total number of users increases.

CPU consumption per user for Outlook

CPU Consumption Per User for Outlook

Applying a second order polynomial to this cost per user, we predict that approximately 34,000 Outlook users would consume 75 percent of this server's CPU, as shown in the following graph.

Total CPU consumption for Outlook

Total CPU Consumption for Outlook

Other server roles showed a fairly consistent CPU consumption per user, except for the Mailbox server role, which showed an increase at high user numbers, likely because the Jet Database Cache per user was reduced.

CPU consumption of other server roles for Outlook

CPU Consumption of Other Roles

Outlook Web App

Using the same design as previous tests, a Windows Server 2008 R2 with a 2007 CPU design that uses two L5335 Xeon 4-core processors running at 2 GHz, our results project a scaling limit of approximately 14,000 heavy Outlook Web App users per Client Access server. There was no significant difference in performance between Windows Server 2008 SP2 and Windows Server 2008 R2. Enabling MailTips also had no significant effect, as shown in the following graph.

CPU consumption per user for Outlook Web App

Client Access Server CPU Consumption Per User

For our tests, we define a heavy Outlook Web App user as one who sends 20 messages per day (7x10-4 messages per second). To calculate the total number of Outlook Web App users, we took the messages sent per second as calculated from the MSEXchange Owa\Messages Sent counter and divided by the (messages sent per second)/(heavy user). Finally, we divided by two as a typical 'peak load' factor. The trend line is based on the MailTips–enabled Windows Server 2008 R2 test.

Total CPU consumption for Outlook Web App

Total Client Access Server CPU

Each simulated Outlook Web App user generated about 0.45 KB/sec of network traffic with approximately 0.03 KB per second per user with the addition of MailTips.

Much like previous tests, the other server roles didn't show an increase in cost per user with additional users.

CPU consumption of other roles for Outlook Web App

CPU Consumption of Other Roles

Outlook Web App in Exchange SP1

Exchange 2010 SP1 added a Prefetch feature to Outlook Web App. This causes some e-mail to be downloaded to the client computer in the background without any action being taken by the user. This improves the user experience if the user chooses to open the e-mails, but at an increased cost to Exchange. The increased cost depends on the number of messages that are downloaded but not opened by the user. In a separate topology, we simulated users who downloaded twice as many messages as they opened, and found that the CPU cost/user on the Client Access server increased by about 36%. However on the Mailbox server, the CPU cost/user went up about 3% and disk IOPS/user went up about 4%. Using the Prefetch feature would reduce the number of heavy Outlook Web App users to 9,000.

Exchange ActiveSync

Using our standard hardware, we calculated a scaling limit of approximately 7,500 heavy Outlook users who have Exchange ActiveSync enabled per Client Access server. There was no significant capacity difference between Windows Server 2008 SP2 and Windows Server 2008 R2.

For the tests, between 5,000 and 7,800 Outlook users inside the firewall were used. The number of users that had Exchange ActiveSync enabled was then adjusted. The Exchange ActiveSync cost was calculated by subtracting the baseline values recorded when none of the Outlook users were using Exchange ActiveSync. The following graph shows that Exchange ActiveSync adds about 1.1 Mcycles per second per user to the Client Access server.

CPU consumption per user for Exchange ActiveSync

CPU Consumption Per User for Exchange ActiveSync

The costs in the Microsoft.Exchange.RPCClientAccess.Service process have gone up in the following graph compared to the Outlook case previously mentioned because the Exchange ActiveSync users are replying and sending from their mobile phones or devices so that they now send and receive about 120 messages per day. In the following graph, we added the Exchange ActiveSync CPU per user to the RPCClientAccess.Service CPU per user to see the total effect on the Client Access server. We predict that 7,500 users will consume 75 percent total CPU.

CPU consumption for Exchange ActiveSync and the RPC Client Access service

CPU Consumption for Exchange ActiveSync

Exchange ActiveSync usage increased the network traffic by about 1.1 KB per Exchange ActiveSync user.

Internet Message Access Protocol version 4 (IMAP4)

Using our standard test hardware and software, we project a scaling limit of approximately 500 concurrent IMAP4 users with 100-MB mailboxes per Client Access server. This translates to about 14,000 heavy users during the business day with a peak load factor of 2. We found no significant difference between Windows Server 2008 SP2 and Windows Server 2008 R2. We do see a dramatic increase in costs with mailbox size for IMAP4 users. We can document a 40 percent improvement over performance for IMAP4 users compared to Exchange 2007.

Although the cost per IMAP4 user showed an increasing trend with the number of users (with 2-MB mailboxes), this behavior was identical on both Windows Server 2008 SP2 and Windows Server 2008 R2.

CPU consumption per user for IMAP4

CPU Consumption Per User for IMAP4

As with the Outlook Anywhere data, because the cost per user increases with the number of users, the CPU needed to support users grows with the square of the number of users.

A 100-profile (heavy) user receives 80 messages per day, or, on average, 0.00278 messages per second in an 8-hour work day. To obtain the Daily IMAP4 User value, we divided the MSExchangeIMAP4\Fetch Rate by (messages received per second)/heavy user, and then, finally, divided by two to arrive at a typical 'peak load' factor. Projecting 75 percent total CPU on this Client Access server, or 12 GHz consumed, yields about 62,000 daily IMAP4 users.

Total CPU consumption for IMAP4

Total CPU Consumption for IMAP4

IMAP4 Clients and Mailbox Size

The IMAP4 load typically depends on the size of the user's mailbox, because many clients retrieve the headers from the entire folder when they open it. Comparing 2-MB, 100-MB, and 220-MB mailboxes, we saw the slope of the Client Access server MHz-per-user value increase significantly as the mailbox size increased.

MHz per user for IMAP4

MHz per user for IMAP4

We project a factor of 10 drop in the limit of the concurrent number of IMAP4 users from a mailbox size of 2 MB to a mailbox size of 100 MB, and another factor of two drop between a mailbox size of 100 MB and 220 MB. This suggests that mailbox size will play an important role in planning for IMAP4 Client Access server capacity.

CPU consumption by mailbox size

Client Access server CPU consumption

The larger mailboxes resulted in an expected increase in the RPC operations per user to retrieve the additional data, which increased the costs on the Mailbox server. In this case, the RPC operations per user and the MHz per user both doubled. The number of LDAP searches per user, and therefore the Active Directory costs, also increased because of increased recipient resolution.

CPU consumption of other roles

CPU Consumption of Other Roles

Clearly, knowing the average mailbox size will play an important role in planning for IMAP4 Client Access server capacity.

IMAP4 Comparison with Exchange 2007

We've made significant improvements in the performance of the IMAP4 protocol for Exchange 2010. We've reduced the total Client Access server CPU per user by 40 percent while reducing the memory footprint by 30 percent. These results were obtained with MAPI users, which is typical of enterprise environments with a large percentage of Outlook users who have 100-MB mailboxes. We haven't included the improvements we made to further reduce our CPU costs with non-MAPI (MIME) messages.

In the following table, the data was generated from users who performed the same set of actions, but the Exchange 2010 system responded more quickly to requests. So, the number of concurrent connections dropped even while the fetch rate increased. We used IMAP4 fetch commands to normalize our results. The fetch represented a unit of work that included, for example, a portion of logon and logoff actions. The results here are for a stand-alone Client Access server (with two dual-core Xeon Processors at 2.33GHz) in a multi-computer topology.

Response time of various versions of Exchange Server

IMAP4 Exchange 2007 SP1 Exchange 2010 % improvement (Exchange 2007–Exchange 2010)/Exchange 2007

.CLR Memory\#Bytes in all Heaps (MB)

885

592

33%

Current Connections

1222.10

974.84

 

fetch rate

9.435

10.468

 

% process time

124.69

71.18

 

Imap4 Mcycles/fetch

308.05

158.50

49%

Client Access server

% Processor CPU

35.95

22.68

 

total CPU/fetch

355.30

202.03

43%

Post Office Protocol version 3 (POP3)

The following graph would map to about 36,000 heavy users if the load were spread over the 8-hour business day with a peak-load factor of 2. We found no significant difference between Windows 2008 SP2 and Windows 2008 R2.

The POP3 protocol verbs are independent of operating system and of load. In the following graph, we increased the number of concurrent POP3 sessions, each doing the same work per connection. Because the number of RETRs (messages retrieved by the client) per connection remained the same, we used it as a unit of work. In this case, this included a portion of the other actions, like the cost of logon.

CPU consumption per RETR

CPU Consumption per RETR

Based on the 80 RETR per user/(8-hour day) values, we can convert this to GHz per heavy user (assuming a peak load factor of 2) and project that 36,000 daily POP3 users would consume 12 GHz of CPU (75 percent of this server).

Total CPU consumed per heavy user

Total CPU Consumed per Heavy User

There are no Hub Transport server costs associated with the POP3 protocol, but the Mailbox server and Active Directory costs are flat, as shown in the following graph.

CPU consumption of other server roles

CPU Consumption of other server roles

The mailbox database IOPS per heavy POP3 user actually fell as the number of users increased, due to increasing efficiencies in the database cache and pre-reading behavior.

Mailbox database IOPS per heavy user

Mailbox Database IOPS per Heavy user

Exchange Web Services (Entourage)

Using our standard testing hardware environment, we project a scaling limit of approximately 17,000 Entourage users. There is no significant difference between Windows Server 2008 R2 and Windows Server 2008 SP2.

The cost per user also increases with the number of users in our Entourage simulation.

CPU consumption per user for Exchange Web Services

CPU Consumption per user for Exchange Web Services

As with the Outlook Anywhere data, because the cost per user increases with the number of users, the CPU needed to support users grows with the square of the number of users. The following graph is based on the R2 data.

Total CPU consumption by number of users

Total CPU Consumption

As expected, the cost of the other roles with respect to Entourage users is basically flat. We do see an increase in the Mailbox server role CPU above 10,000 Entourage users, due to the reduced database cache per user.

CPU consumption of other roles

CPU Consumption of Other Roles

Conclusion

We've provided information about how the relative CPU costs of each protocol compare when the Client Access server role is being used. The important findings are as follows:

  • Windows Server 2008 R2 is a much better choice if you have significant numbers of Outlook Anywhere users.

  • IMAP4 costs grow with the user's mailbox size for some types of IMAP4 clients.

We can group the costs per user into the following categories.

Costs per user for Exchange protocols

MHz/User Protocol

0.4

POP3

0.4

Outlook

0.8

Exchange Web Services (Entourage)

0.8

IMAP4 with 100-MB mailboxes

0.8

Outlook Anywhere

0.8

Outlook Web App

1.2

Outlook Web App in Exchange 2010 SP1

1.6

Exchange ActiveSync (delta)

Additional Information

For the complete Exchange Server 2010 documentation, see Exchange Server 2010 Library.