Page Churn in Exchange
Recently I encountered a case where the customer's goal was to measure page churn in an Exchange database. They were struggling with the size of their snapshots and wanted to know how and why pages get churned in the database.
So what is page churn? In this instance we are discussing the pages that have been altered or dirtied over a period of time. If I have a database with 1000 pages that are each 8K in size and 120 of those pages have been written to (dirtied) since the last time I checked it could be said that I have 12% page churn.
The first question the customer had was how to measure this. We looked at several tools, but in the end the answer is that Exchange has no built in tools for measuring page churn. You can look at the number of pages written in the Exchange Performance counters, but there is no way to know if all those writes simply overwrote the same page many times or if each write went to a different page.
So what causes churn? Any write to the database, whether it is to send an email, delete it, update an index or carry out maintenance is considered to be page churn. Some of these operations might dirty a single page or they might dirty thousands of pages. In a busy database it would not be surprising to see nearly 100% churn in all the pages in a period of only 24 hours. Online maintenance will traverse all of the pages during each full pass and has the potential to change every page.
This was affecting the customer's snapshots because they had chosen a sector size in their backup solution that was considerably larger than the page size in the Exchange database (8K in the database and 1MB was being used in the backup). This meant that if VSS saw any changes to this large block of data it would back up the whole thing. Pages in Exchange 2007 and lower are randomly distributed throughout the database. Each 1MB block consists of 128 Exchange 2007 database pages. Therefore, if in following its internal structure, an HTML formatted email consisting of a header and the text "Hello World!" could be written to pages 500, 900, and 1150 we would dirty 3 Exchange pages and 3 of the 1MB sectors the backup was looking at. If we then needed to update the index, table statistics and deal with an internal maintenance event we would dirty even more pages and they might be very widely dispersed in the database file.
As I am sure you can see there can be a vast discrepancy between the page churn as seen by Exchange and the churn noted by the backup software with its 1MB sectors. If, in a period of time, Exchange has written to 10% of its pages, and those pages are randomly distributed throughout the database, it is conceivable that the backup software could view 100% of the database as having been dirtied, and therefore requiring backup.
The immediate solutions were to see if the customer could use a smaller sector in the backup software to reduce situations where only 1 in 128 pages had been modified, and to see if the customer could reduce the frequency of the backup so they wouldn't accumulate so many large files (they were backing up hourly).
Would this be different in Exchange 2010? A little. Exchange 2010 was redesigned to store data sequentially whenever possible. Therefore the actual message, when written is likely to only affect a single 1MB sector. However, with that being said, there are still statistics, secondary indexes and online maintenance to consider. All of these have the potential to dirty more of the oversized pages that the backup software is looking at.
Thanks for reading! Until next time...
May 8, 2012
Since I posted this two further points have been raised:
- The blocks selected by the backup may align with the disk itself and not the database file
- The active and passive copies of an Exchange database in a CCR cluster or a DAG may not be completely identical
The question that raised this was: Why are the backups (block style backups as discussed throughout this post) of my active and passive copies reporting different sizes?
The first part of the answer is that the files are not guaranteed to start at the same location on their respective disks and the fragmentation pattern on those disks is very likely to be different. If the backup software is using blocks at the hardware level there is little chance of the backups being the same size (except by coincidence). The second part of this answer is that the transactions in the logs files are not guaranteed to execute in exactly the same order and get exactly the same page allocations. There are numerous threads within store.exe and those threads are all running concurrently. The odds that the passive node will replay the logs and make all I/O requests in the same order that the active node carried them out are extremely remote. Therefore, even within the database the byte location of a particular item might be quite different between the two copies. The item that fell into a particular block could easily end up in a different block.
The bottom line is that there is no valid comparison between the backup size of the active node and the passive node.
Comments
- Anonymous
August 18, 2011
Great works Chris. This is really helpful.