Share via


Transport Server Storage Design

[This is pre-release documentation and subject to change in future releases. This topic's current status is: Writing Not Started.]

Edge Transport and Hub Transport servers are the server roles that deliver:

  • Mail into and out of the organization.
  • Mail into and out of Mailbox servers.
  • Voice mail messages submitted by Unified Messaging servers.

To ensure efficient mail flow and delivery throughout your Exchange organization, Edge Transport and Hub Transport servers should have a correctly designed storage solution.

This topic provides information and examples to help you determine the capacity and input/output (I/O) requirements for Edge Transport and Hub Transport servers.

Edge Transport Server Capacity and I/O Requirements

Edge Transport servers must be designed to meet the capacity and transactional I/O requirements of each organization. It is critical to correctly maintain queue growth and to route mail as fast as possible, so that service level agreements (SLAs) are not adversely affected. There are several factors that affect the overall capacity of an Edge Transport server:

  • Message tracking logs
  • Protocol logs
  • Mail database
  • Connectivity logs
  • Agent logs

A minimum of 4 gigabytes (GB) of free space and free database space must exist on the drive containing the message queue database, or the transport system will activate back pressure, a system resource monitoring feature of the Microsoft Exchange Server 2007 transport service. The default value for back pressure is controlled by the PercentageDatabaseDiskSpaceUsedHighThreshold parameter, which can be modified if necessary. For more information about back pressure, and the options to configure back pressure, see Understanding Back Pressure.

If message tracking logs are enabled, additional capacity is required. Message tracking capacity requirements depend on the number of messages received by the transport server. If your organization currently uses Microsoft Exchange Server 2003, you can determine your current log generation rate, and set a hard limit for the number of days to keep data, such as 10 days. Microsoft generates 220 megabytes (MB) of message tracking logs each business day (less on the weekend) and ensures enough capacity for a week of logs (approximately 1.3 GB). Protocol, connectivity, and agent log sizes vary depending on the activity. As a reference point, the production transport servers at Microsoft generate:

  • From 5 to 15 GB of protocol logs per day on the Edge Transport servers. Enough capacity for the protocol log quota, which is 15 GB, is assured.
  • 100 MB of connectivity logs per day on the Edge Transport servers. Enough capacity for a week of logs, which is approximately 600 MB, is assured.
  • 250 MB of agent logs per day on the Edge Transport servers. Enough capacity for a week of logs, which is approximately 1.5 GB, is assured.

Transaction logs do not require much disk capacity because normal log creation is limited by the use of circular logging. As a result, transaction logs can be placed on the logical unit number (LUN) containing the operating system. Microsoft uses a two-disk mirror for this LUN.

The database (mail.que) does not store items indefinitely, and the capacity reserved should be the average message size multiplied by the maximum queue, in the case where the queue is at maximum and the server is shut down. A 500,000 item queue with an average message size of 50 kilobytes (KB) is approximately 25 GB of data in the database.

Edge Transport servers that run antivirus scans on incoming mail need enough space for the antivirus quarantine. The disk I/O resource requirements depend on the percentage of incoming mail that is infected with viruses, which is typically small. The quantity of infected messages and attachments and how long they remain in quarantine dictate the amount of space that quarantine requires. One GB of disk space is a good starting point, although each organization's actual needs are different.

For most Edge Transport server deployments, we recommend that you add an overhead factor of 20 percent to the database size (after all other factors have been considered). This value will account for internal structures within the database and ensure adequate space if a spike or change in mail flow results in database size growth.

Capacity Example for an Edge Transport Server

In this example, the transaction logs are stored on the operating system partition (C:), which is hosted by a battery-backed, caching redundant array of independent disks (RAID) controller. The capacity requirements are small (in the range of several megabytes).

Determining the capacity of an Edge Transport server is a two-step process. First, calculate the database size, and then determine the transaction log size.

Step 1: Database Size

Consider an Edge Transport server that receives an average of 5 messages per second (with an average size of 50 KB) over a 24-hour period, with a maximum queue of 500,000 items. After all other factors have been added, an additional 20 percent overhead is included, and the total size on disk is 58 GB, as shown in the following table.

Database size

Queue maximum Queue capacity Protocol logs Message tracking logs Antivirus quarantine Connectivity logs Agent logs Free space Total size on disk

500,000

Approximately 25 GB (500,000 × 50 KB)

15 GB

1.3 GB

1 GB

600 MB

1.5 GB

4 GB

58 GB (48 GB + 20%)

Step 2: Transaction Log Size

To determine transaction log size, you must consider transactional I/O, other disk I/O, and database I/O per second (IOPS) per message.

Transactional I/O

If the server has enough available memory, incoming mail will be stored in RAM and in the transaction log, minimizing disk impact. When memory resources are low, only the first 128 KB of the message is stored in memory and the transaction log. The rest of the message is stored in the database. During content conversion, data is streamed to the temp directory (%TEMP%). It is therefore important to place the temp directory on the same LUN as the database. It is also important to set your storage controller cache to 50 percent read and 50 percent write. When there is not a large growing queue, few of the disk I/Os will be read operations. When a queue is present, the message may not be in the database cache, therefore requiring more disk I/Os.

Other Disk I/O

In addition to transactional I/O, there may be other disk I/O on the system. For example:

  • Enabling message tracking logs requires an additional 2–5 percent overhead on disk I/O.
  • Enabling protocol and connectivity logs has a small overhead on disk I/O that depends on the amount of incoming mail.
  • Enabling the default agent logs has a small overhead on disk I/O, although if custom agents are in use, more disk resources may be required.
  • Anti-spam and antivirus operations occur in memory, requiring more CPU resources.

Be sure to test your Edge Transport servers with all of the services running during the test that you expect to use in production.

Database IOPS per Message

During internal testing at Microsoft, an average message size of 60 KB was used. Many organizations size their transport servers with a particular message rate in mind, for example, 20 messages per second. This message rate would require 140 (20 × (4.5 + 2.5)) database I/Os and 220 (20 × 11) log I/Os.

When a queue forms, more reads are required, particularly in the case of RAID-1/0, because every physical disk responds to the read requests, as shown in the following table.

Database IOPS per message

Edge transport database I/O (steady state) Approximate Edge I/O

Total IOPS per message (approximately 60 KB)

18

Log write I/Os per message (sequential)

11

Database write I/Os per message (random)

4.5

Database read I/Os per message (random)

2.5

Note

The numbers in the preceding table are averages of many servers in production with variances up to plus or minus 30 percent. Extra features, such as journaling and transport rules, also affect the expected I/O per message, and these features would affect the example production numbers provided in this topic.

Applying Sizing Guidelines to Your Hardware Design for an Edge Transport Server

After you have your capacity and transactional I/O requirements for an Edge Transport server, you can apply them to a proposed hardware design. For processor and memory configurations, see Planning Processor Configurations and Planning Memory Configurations. When designing an Edge Transport server, it is important to have enough RAM (each message needs 8 or 9 KB of memory) in the system to prevent the temporary caching of queued message bodies to disk.

An Edge Transport server uses an Extensible Storage Engine (ESE) database. It is important for resiliency and best performance to separate the log and database files on their own physical disks in environments where there will be a large queue. In smaller deployments that have lower disk I/O requirements, it may be feasible to place both the transaction logs and the database on the same LUN. The Edge Transport server, like the Mailbox server, requires I/O response times that are less than 20 milliseconds.

It is important to use battery-backed, caching RAID controllers and to run database maintenance nightly. Also make sure that the chosen disk type will provide the right balance of capacity and performance.

Hardware Design Sizing Example for an Edge Transport Server

This example illustrates how to design your storage around the expected messages per second. In this example, there is an Edge Transport server that handles 20 messages per second, requiring 140 IOPS for the database LUN and 220 IOPS for the log LUN. Always add a 20 percent growth factor for disk I/O performance to handle heavier than normal days. The disk layout is RAID10. For the hardware sizing results, see the following table.

Hardware sizing

Disks (1) and (2), RAID1 layout Disks (3), (4), (5), and (6), RAID10 layout

Operating system and transaction logs 220 + 20% = 264 IOPS

Database, protocol, and message tracking logs and antivirus quarantine140 + 20% = 168 IOPS

This example has a database LUN capacity requirement of approximately 70 GB for a week of data. You should double the capacity requirement to 140 GB if you require two weeks of data. Using 146-GB physical disks would allow a LUN of 292 GB in a RAID10 configuration.

Hub Transport Server Capacity and I/O Requirements

Hub Transport servers must also be designed to meet the capacity and transactional I/O requirements of the organization. As with the Edge Transport server, a minimum of 4 GB of free disk space and free database space must exist on the drive containing the message queue database, or the transport system will activate back pressure. You can modify the default value for the PercentageDatabaseDiskSpaceUsedHighThreshold parameter on Hub Transport servers.

Message tracking log capacity depends on the number of messages received by the transport server. If your organization currently uses Exchange 2003, you can determine your current log generation rate, and set a hard limit for the number of days to keep data, such as 10 days. Microsoft generates 700 MB of message tracking logs each business day (less on the weekend) on the Hub Transport servers, and ensures enough capacity for a week of logs, which is approximately 4.5 GB.

Protocol log sizes vary depending on the activity. Microsoft generates 2.7 GB of protocol logs per day on the Hub Transport servers, and ensures that there is enough capacity for a week of logs, which is approximately 16 GB.

Transaction logs do not require much disk capacity because normal log creation is limited by the use of circular logging. As a result, the transaction logs can be placed on the operating system LUN. Microsoft uses a two-disk mirror for this LUN.

The database (mail.que) does not store items indefinitely, and the capacity reserved should be the average message size multiplied by the maximum queue, in the case where the queue is at maximum and the server is shut down. A 500,000 item queue at an average message size of 50 KB is approximately 25 GB of data in the database.

For most Hub Transport server deployments, we recommend that you also add an extra 20 percent overhead to the database size after all other factors have been considered.

Transport Dumpster

Special consideration is necessary for Hub Transport servers in sites that contain:

  • Clustered mailbox servers deployed in a cluster continuous replication (CCR) environment using either the release to manufacturing (RTM) version of Microsoft Exchange Server 2007 or Microsoft Exchange Server 2007 Service Pack 1 (SP1).
  • Mailbox servers running Exchange 2007 (SP1) that have one or more storage groups enabled for local continuous replication (LCR).

When deploying either of the preceding environments, make sure that you design your Hub Transport server with enough capacity to store mail long enough for all storage groups in its site, so that messages can be recovered in the event of an unscheduled outage of the active node. This feature is known as the transport dumpster.

The I/O overhead of the transport dumpster is similar to growing a queue. There are two parameters you can use to control how long a message stays in the transport dumpster: MaxDumpsterSizePerStorageGroup and MaxDumpsterTime. The default value for MaxDumpsterSizePerStorageGroup is 18 MB. To size the transport dumpster properly for your environment, take your largest acceptable message size and increase that size by 50 percent. For example, if the message quota is 10 MB, you would want to set the MaxDumpsterSizePerStorageGroup to 15 MB. If there is more than one Hub Transport server in the same Active Directory directory service site as the clustered mailbox server in the CCR environment, or an LCR environment running Exchange 2007 SP1, the aggregate storage for the storage groups on that clustered mailbox server is spread across all Hub Transport servers. For example, if you have four Hub Transport servers with a 15-MB transport dumpster, there would be a 60-MB transport dumpster for that storage group.

For organizations without message size limits, we recommend that you set MaxDumpsterSizePerStorageGroup to a value that is 1.5 times the average size of messages sent within the organization. Also, if a maximum message size is not set, you cannot guarantee to get that message back after an unscheduled failover in a CCR environment, or after activation of the passive copy in an LCR environment that is running Exchange 2007 SP1.

We recommend that MaxDumpsterTime be set to 7 days, which is the default value.

The capacity consumed by the transport dumpster should be the number of storage groups multiplied by the maximum transport dumpster size. If the maximum transport dumpster size is 15 MB, and the Hub Transport server services 100 storage groups in an LCR (Exchange 2007 SP1) or CCR (Exchange 2007 RTM ) environment, 1.5 GB should be allocated for the transport dumpster.

Transport Dumpster Sizing Example

In this example, the transaction logs are on the disk containing the operating system partition (C:), which is hosted by a battery-backed, caching RAID controller. The capacity requirements will be small (in the range of megabytes). For the sizing results, see the following tables.

Determining the capacity required by the transport dumpster feature is a two-step process. First, calculate the database size, and then determine the transaction log size.

Step 1: Database Size

Consider a Hub Transport server that receives an average of 5 messages per second over a 24-hour period, with a maximum queue of 500,000 items.

Transport dumpster sizing

Queue maximum Queue capacity Protocol logs Message tracking logs Transport dumpster Total size on disk

500,000

25 GB (500,000 × 50 KB)

15 GB

4.5 GB

1.5 GB

55 GB (46 GB + 20%)

Step 2: Transaction Log Size

To determine transaction log size, you must consider transactional I/O, other disk I/O, and database IOPS per message.

Transactional I/O

The same guidance on transactional I/O listed earlier for Edge Transport servers applies to Hub Transport servers. As mentioned previously, it is especially important to configure the cache settings on your storage controller as follows: 50 percent read, 50 percent write.

Transport Dumpster I/O

When the transport dumpster is enabled, the disk I/O increases. Although database writes increase, database reads now also occur, which on Microsoft production servers averages approximately three reads per message.

Other Disk I/O

The same guidance on other disk I/O listed previously for Edge Transport servers applies to Hub Transport servers. It is especially important to test your Hub Transport servers with all of the services running during the test that you expect to use in production.

Database IOPS per Message

In internal testing at Microsoft, using an average message size of 40 KB, enabling the transport dumpster requires more disk resources on the Hub Transport server. Many enterprises size their transport servers with a particular message rate assumed, for example, 20 messages per second. If the transport dumpster is enabled, it would require 200 database I/Os (20 × (7 + 3)) and 140 log I/Os (20 × 7) to service an incoming message rate of 20 messages per second. With the transport dumpster disabled, it would require 40 database I/Os (20 × 2) and 40 log I/Os (20 × 2) to service an incoming message rate of 20 messages per second.

When a queue forms, more reads are required, particularly in the case of RAID10 because every physical disk responds to the read requests. For more information, see the following table.

Transaction log sizing

Hub Transport server database I/O (steady state) Transport dumpster enabled Transport dumpster disabled

Total IOPS per message (approximately 40 KB)

17

4

Log write I/Os per message (sequential)

7

2

Database write I/Os per message (random)

7

2

Database read I/Os per message (random)

3

0

Note

The numbers in the preceding table are averages of many servers in production with variances up to plus or minus 30 percent. Extra features, such as journaling and transport rules, will have an impact to the expected I/O per message, and these features would affect the values in this example.

Applying Sizing Guidelines to Your Hardware Design for a Hub Transport Server

After you have your capacity and transactional I/O requirements for a Hub Transport server, you can apply them to a proposed hardware design. For processor and memory configurations for Hub Transport servers, see Planning Processor Configurations and Planning Memory Configurations. When designing a Hub Transport server, it is important to have enough RAM (each message needs 8 or 9 KB of memory) in the system to prevent the temporary caching of queued message bodies to disk.

A Hub Transport server uses an ESE database. It is important for best performance to separate the log and database files on their own physical disks in environments where there will be a large queue, or when using the transport dumpster. For smaller deployments with lower disk I/O requirements, it may be feasible to place both the transaction logs and the database on the same LUN. The Hub Transport server, like the Edge Transport server, requires I/O response times that are under 20 milliseconds.

Hardware Design Sizing Examples for a Hub Transport Server

It is important to design your storage around the expected messages per second. In this example, a Hub Transport server handles 20 messages per second with the transport dumpster disabled, requiring 40 IOPS for the database LUN and 40 IOPS for the log LUN. Always add a 20 percent growth factor for disk I/O performance to handle heavier than normal days. The disk layout would be RAID1. This example has a database LUN capacity requirement of approximately 55 GB for a week of data. You should double the capacity requirement to 110 GB if you require 2 weeks of data. Using 140-GB physical disks would provide a database LUN of 140 GB in a RAID1 configuration and a log LUN of 140 GB in a RAID1 configuration. For results, see the following table.

Hardware sizing for a Hub Transport server handling 20 messages per second with the transport dumpster disabled

Disks (1) and (2), RAID1 layout Disks (3) and (4), RAID1 layout

Operating system and transaction logs40 + 20% = 48 IOPS

Database, protocol, message tracking logs, and antivirus quarantine40 + 20% = 48 IOPS

In this next example, there is a Hub Transport server with the transport dumpster enabled that handles 20 messages per second. This configuration requires 200 IOPS for the database LUN and 140 IOPS for the log LUN, plus the extra 20 percent growth factor. The disk layout is RAID10. This example has a database LUN capacity requirement of approximately 55 GB for a week of data, or 110 GB if two weeks of data is required. Using 140-GB physical disks would provide a database LUN of 280 GB in a RAID10 configuration and a log LUN of 140 GB in a RAID1 configuration.

Hardware sizing for a Hub Transport server handling 20 messages per second with the transport dumpster enabled

Disks (1) and (2), RAID1 layout Disks (3), (4), (5), and (6), RAID10 layout

Operating system and transaction logs140 + 20% = 168 IOPS

Database, protocol, message tracking logs, and antivirus quarantine200 + 20% = 240 IOPS