Share via


Understanding Mailbox Database and Log Capacity Factors

This topic explains the factors that you should consider when planning mailbox database and log capacity as part of mailbox server storage design in Microsoft Exchange Server 2010.

Mailbox Database Capacity

Many factors influence a sizing capacity plan for Exchange Server 2010 Mailbox databases. This section discusses the following:

  • Mailbox storage quotas
  • Database white space
  • Mailbox database recoverable items
  • Actual mailbox size
  • Content indexing
  • Offline database maintenance
  • Recovery database
  • Database size
  • Database growth overhead

Mailbox Storage Quotas

The first metric to understand is the storage size limit, known as the mailbox storage quota, that's in effect in your organization. Knowing the amount of data that an end user is allowed to store in his or her mailbox allows you to determine how many user mailboxes can be housed on the server. Although mailbox storage quotas can change in response to changing organizational requirements, having a goal for the mailbox storage quota is the first step in determining your needed mailbox database capacity.

For example, if you have a server with 5,000 250-MB user mailboxes on it, you need at least 1.25 TB of disk space, excluding space requirements for recoverable items. If a limit isn't set for mailbox storage quotas, you'll find it difficult to estimate database capacity. Mailbox storage quotas for Exchange 2010 need to include the space for both the primary mailbox and personal archive mailbox (when used). For more information, see Managing Mailbox Servers and Managing Personal Archive.

Database White Space

The database size on the physical disk isn't just the number of users multiplied by the mailbox storage quota. When the majority of users aren't approaching their mailbox storage quota, the databases consume less space and white space isn't a capacity concern. The database itself will always have free pages, or white space, spread throughout. During background database maintenance, items marked for removal from the database are removed, which frees these pages. The percentage of white space is constantly changing due to the efforts of the 24x7 online defragmentation process.

You can estimate the amount of white space in the database by knowing the amount of mail sent and received by the users with mailboxes in the database. For example, if you have 100 2-GB mailboxes (total of 200 GB) in a database where users send and receive an average of 10 MB of mail per day, the amount of white space is approximately 1 GB (100 mailboxes × 10 MB per mailbox). The amount of white space can exceed this approximation if background database maintenance isn't able to complete a full pass.

Return to top

Mailbox Database Recoverable Items

Each database has a dumpster that stores soft-deleted items. By default, soft-deleted items are stored for 14 days and calendar items are stored for 120 days in Exchange 2010.

In addition, Exchange 2010 also includes the ability to prevent the purging of data before the deleted item retention window has passed. This functionality is known as single item recovery. When single item recovery is enabled (disabled by default), there is an additional 1.2 percent increase in the size of the mailbox for a 14-day deleted item retention window. For calendar version logging data (enabled by default), there is an additional 5.8 percent increase in the size of the mailbox.

The formula for determining the dumpster space requirements for 14 days of deleted item retention with single item recovery and calendar version logging enabled is:

Dumpster Size = (Daily Incoming/Outgoing Mail x Average Message Size x Deleted Item Retention Window) + (Mailbox Quota Size x 0.012) + (Mailbox Quota Size x 0.058)

For example, if the mailbox size is 2 GB, enabling single item recovery for 14 days of deleted item retention requires an additional 25 MB of space, and the calendar logging feature requires an additional 119 MB.

For more information, see the following topics:

Actual Mailbox Size

Over time, user mailboxes will reach the mailbox storage quota, so an amount of mail equivalent to the incoming mail will need to be deleted to remain under the mailbox storage quota. This requirement means that the dumpster will increase to a maximum size equivalent to the amount of e-mail sent and received each day multiplied by the number of days within the deleted item retention window. If the majority of users haven't reached the storage quota, only some of the incoming/outgoing mail is deleted. Therefore, the growth is split between the dumpster and the increase in mailbox size.

The following is a formula for database size using a 2-GB mailbox without using the personal archive feature:

Database Size = Mailbox Quota + White Space + Dumpster Size

For example, a 250-MB 100 messages per day mailbox that receives 37 MB of mail per five-day work week (with an average message size of 75 KB) would result in 120 MB in the dumpster with single item recovery enabled for 14 days, and 7.3 MB in white space, for a total mailbox size of 378 MB. Another example is a 2-GB 100 messages per day profile mailbox that receives 37 MB of mail per week, which results in 246 MB in the dumpster and 7.3 MB in white space, for a total mailbox size of 2.25 GB.

After you have determined the projected actual mailbox size, you can use that value to determine the maximum number of users per database. Divide projected mailbox size by the recommended database size. This value will also help you determine how many databases you will need to handle the projected user count, assuming fully populated databases. Be aware that due to non-transactional input/output (I/O) or because of hardware limitations, you may have to modify the number of users placed on a single server. Some administrators will prefer to use more databases to further reduce the database size. This approach can assist with backup and restore windows at the cost of more complexity in managing more databases per server.

Content Indexing

Content indexing creates an index, or catalog, that allows users to easily and quickly search through their mail items rather than manually search through the mailbox. Exchange 2010 creates an index that is about 10 percent of the total database size, which is placed on the same LUN as the database. Therefore, an additional 10 percent needs to be factored into the database LUN size for content indexing.

Return to top

Offline Database Maintenance

A database that needs to be compacted offline requires capacity equal to the size of the target database plus 10 percent. Whether you allocate enough space for a single database, or a backup set, additional space must be available to perform these operations.

Important

Offline maintenance procedures should only be implemented by request of Microsoft Customer Service and Support because offline maintenance procedures invalidate all database copies and require a full reseed of the database.

Recovery Database

If you plan to use a recovery database as part of your disaster recovery plans, sufficient capacity must be available to handle all the databases you want to be able to simultaneously restore on that server. For more information, see Recovery Databases.

Database Size

The database size ultimately determines how many mailboxes you deploy within each database and how many databases you deploy. The database size you deploy depends on several factors:

  • Backup/restore service level agreements (SLAs)   The database size ultimately dictates how fast you can backup and restore the data within a reasonable amount of time.
  • High availability architecture   If you plan to have multiple database copies, you can design your databases to be 2 TB in size because your copies become your first line of defense in terms of recovery operations.
  • Storage architecture   If you plan to deploy on JBOD storage (one disk houses both the database and its corresponding transaction logs), then the size of the disk you use dictates the maximum database size. For example, on a 1 TB disk (with a formatted capacity of about 917 GB), you also need to include space for transaction logs and the content index, and ensure you don't consume all available space.

Database Growth Overhead

After all factors have been considered and calculated, we recommend that you include an additional overhead factor of 20 percent for the database logical unit number (LUN). This value accounts for the other data that resides in the database that isn't necessarily seen when calculating mailbox sizes and white space.

Return to top

Log Capacity

The transaction log files are a record of every transaction performed by the database engine. All transactions are written to the log first, and then lazily written to the database. Unlike Exchange Server 2003, the transaction log files in Exchange 2010 have been reduced in size from 5 MB to 1 MB. This change was made to support the continuous replication features and to minimize the amount of data loss if primary storage fails.

You can use the following table to estimate the number of transaction logs that are generated on an Exchange 2010 Mailbox server where the average message size is 75 KB.

Number of transaction logs generated per mailbox profile

Message profile (75 KB average message size) Number of transaction logs generated per day

50

10

100

20

150

30

200

40

250

50

300

60

350

70

400

80

450

90

500

100

You can use the following guidelines to understand how message size affects the generation rate of transaction logs:

  • If the average message size doubles to 150 KB, the logs generated per mailbox increases by a factor of 1.9. This number represents the percentage of the database that contains the attachments and message tables (message bodies and attachments).
  • Thereafter, as message size doubles beyond 150 KB, the log generation rate per mailbox also doubles, increasing from 1.9 to 3.8.

For example, if you have a 100 messages per day and:

  • An average message size of 150 KB, the logs generated per mailbox are 20 × 1.9 = 38.
  • An average message size of 300 KB, the logs generated per mailbox are 20 × 3.8 = 76.

The following sections discuss factors that affect your log sizing capacity:

  • Backup and restore factors
  • Move mailbox operations
  • Log growth overhead
  • High availability factors
  • LUN capacity planning

Backup and Restore Factors

Log LUN sizing is partly dependent on your backup and restore design. For example, if your design allows you to go back two weeks and replay all the logs generated since then, you will need two weeks of log file space. If your backup design includes weekly full and daily differential backups, the log LUN needs to be larger than an entire week of log file space to allow both backup and replay during restore. Most enterprises that perform a nightly full backup allocate two to three times the required daily log generation capacity. This approach is taken to prevent a backup failure from causing the log drive to fill, which would dismount the database.

If you plan on using the mailbox resiliency and single item recovery features within Exchange 2010 as your backup infrastructure (and thus enabling circular logging), as a best practice, you should ensure that you allocated three times the required daily log generation capacity. This ensures that, when replication is suspended or not functioning under normal parameters, the databases don't dismount due to truncation failures.

Move Mailbox Operations

Moving mailboxes is a primary capacity factor for large mailbox deployments. Many large companies move a percentage of their user mailboxes on a nightly or weekly basis to different databases, servers, or sites. If your organization does this, you may find it necessary to provide extra capacity to the log LUN to accommodate mailbox moves.

Although the source server logs the record deletions, which are small, the target server must write all transferred data first to transaction logs. If you generate 10 GB of log files in one day, and keep a three-day buffer of 30 GB, moving 50 2-GB mailboxes (100 GB) would fill your target log LUN and cause downtime. In cases such as these, you may have to allocate additional capacity for the log LUNs to accommodate your move mailbox practices.

Log Growth Overhead Factor

For most deployments, we recommend that you add an overhead factor of 20 percent to the log size (after all other factors have been considered) when creating the log LUN to ensure necessary capacity exists in moments of unexpected log generation.

High Availability Factors

High availability influences log capacity requirements in three significant ways:

  • Database copy count   The log capacity of the entire system is increased based on the number of database copies chosen in the high availability deployment. If you have three database copies spread across three servers, you need to provision log capacity for each copy on each server.
  • Log truncation mechanism   High availability in Exchange 2010, with the ability to have up to 16 copies of each mailbox database, provides the foundation to use continuous replication circular logging as the log truncation/deletion mechanism as opposed to running Full/Incremental backups to truncate/delete the older logs. For more information, see the "Log Truncation without Backups" section in Understanding Backup, Restore and Disaster Recovery and High Availability and Site Resilience.
  • Database copy replay lag   High availability in Exchange 2010 provides the option to lag log replay on passive database copies (configured on a per copy basis). This feature is used to provide a delay for when logs get played in to lagged database copies. This delay can be useful to protect against events which would cause undesirable content to be replicated to all database copies. The content can be stopped from being played in to the lagged database copy by suspending replay before the logs with the undesired content are played in to the database.
    When replay lag is enabled for a database copy, the log capacity requirements change accordingly. If you have a 14-day lag configured, you need to provision for 17 days worth of logs. The additional log capacity is only required for the database copy that has the lag configured, other copies of that database, which don't have a lag, will have normal (non-lagged) log capacity requirements.

For more information, see Understanding High Availability Factors.

LUN Capacity Planning

The capacity requirements for the LUN will be based on the size of the data set (database, transaction logs, content index, and recovery space) and some additional free space. Most operations management programs have capacity thresholds that provide an alert when a LUN is more than 80 percent utilized.

You can use the following formula to determine the appropriate size of the LUN:

LUN Capacity = Data Size / (1 - Free Space Percentage Requirement)

For example, if you had a data size requirement of 3000 MB and a free space requirement of 20 percent, then the LUN that hosts this data must be 3750 MB in size.