SQL Q&A: Maintaining Logs and Indexes
Some of the key ways to keeping SQL Server running efficiently are preserving log backups and properly maintaining indexes.
Paul S. Randal
Don’t Break the Chain
Q: I’ve been defining a backup strategy for our databases. My plan involves transaction log backups so we can perform disaster recovery with little data loss. I’ve been investigating some of the problems I could encounter and I’ve read several times that I need to be careful not to break the log backup chain. Can you explain what this is and how it could be broken?
A: That’s a great question and it’s something that many people overlook. The log backup chain (sometimes simply called the log chain) refers to an unbroken series of transaction log backups that cover the time from the most recent data backup (full or differential) to the point at which you’d like to restore. An example restore sequence would be as follows:
- The most recent full database backup
- Then the most recent differential database backup
- Then all transaction log backups taken after that
Most people keep more transaction log backups around in case one of the backups becomes corrupt and you have to restore a less-recent data backup. You can get more information about backups and restores in the two TechNet Magazine articles I wrote last year, “Understanding SQL Server Backups” and “Recovering from Disasters Using Backups.”
If any of the necessary log backups are damaged or unavailable for the restore sequence you’ve chosen, the log backup chain will be broken and you won’t be able to restore past that point. If only one of the log backups is damaged, you may be able to force it to restore using the WITH CONTINUE_AFTER_ERROR option. That would force a restore of corrupt transaction log records, which would result in database corruption. I’d be very hesitant about forcing this type of restore.
One operation that could result in a necessary log backup being unavailable is an “out of band” log backup, done without ensuring that a log backup is kept. You might do this to give a copy to a developer, for example. That log backup is part of the log backup chain, as it’s the only one that will contain the log records generated since the previous log backup.
That is, unless you use the WITH COPY_ONLY option, which performs the log backup, but also lets the next log backup effectively back up the same set of log records. See my blog post, “BACKUP WITH COPY_ONLY,” to see more details on how to avoid breaking the backup chain.
A more-common example of an operation breaking the log backup chain is one that prevents you from performing a transaction log backup during regular operations. These types of operations include:
- Switching to the SIMPLE recovery model and then back to FULL or BULK_LOGGED
- Dumping the log in SQL Server 2005 and prior versions using the BACKUP LOG … WITH NO_LOG or TRUNCATE_ONLY options
- Reverting a database from a database snapshot
You’ll need to perform a data backup (full or differential) after any of these operations to allow log backups to continue. This is called restarting the log backup chain.
One last thing: Contrary to popular myth, performing a full or differential backup does not break the log backup chain, and, in fact, has no effect on log backups whatsoever.
Cluster Those Indexes
Q: Many of the tables in our SQL Server 2008 database don’t have a clustered index. I’ve heard that I could have performance problems with forwarded records causing extra IOs. Can you tell me how I can check this, and what I can do about it?
A: A heap is a table that doesn’t have a clustered index. It’s inherently unordered. For those readers who don’t know about forwarded records in heaps and how they’re used, see my blog post, “Forwarding and forwarded records, and the back-pointer size,” for more details. Forwarded records in heaps can lead to extra random IO operations during query processing, which in turn leads to poor performance.
The easiest way to check whether you have queries that are processing forwarded records is to look at the Forwarded Records/sec performance counter in the Access Methods performance object. Then use the sys.dm_db_index_physical_stats dynamic management function with the DETAILED mode against some of the tables in the database, and it will return the number of forwarded records for each table in the forwarded_record_count column of the output. See this topicin Books Online for more details.
The worst way to remove forwarded records is to create a clustered index and then drop it again. This causes all non-clustered indexes on the table to be automatically rebuilt twice, which is an enormous waste of resources. See my blog post for more details: “What happens to non-clustered indexes when the table structure is changed?”
The easiest way to permanently remove and prevent forwarded records in heaps is to create clustered indexes. I don’t want to get into a “clustered index vs. heap” debate here about why you should have clustered indexes in most cases instead of heaps. See my wife Kimberly Tripp’s “Clustering Key” blog post series on this for more details. I encourage you to evaluate using clustered indexes.
When table records increase in size, it can cause forwarded records when there’s insufficient space. Another way to prevent forwarded records, therefore, is to prevent the records from changing size. This could mean, for example, using default values for variable-length columns.
In SQL Server 2008, there’s a new ALTER TABLE … REBUILD statement that lets you rebuild heaps. This works in the same way that the ALTER INDEX … REBUILD statement lets you rebuild indexes. Microsoft added this statement to support the data-compression feature, but it will work for our purposes. See this topic in Books Online for more details.
Index Maintenance
Q: I’ve changed our index maintenance routines to use online index rebuilds, but I’m still seeing blocking problems sometimes when the maintenance routines run. Why is this? I thought online index operations don’t use locks, so I shouldn’t see any blocking. Is this expected behavior or am I doing something wrong?
A: You’re seeing expected behavior. There’s a shared table lock required at the start of the operation, while the operation initializes (a very fast process). This is immediately dropped. This lock must be queued like any other lock, and it will prevent any new queries from making modifications to the table until you can grant and release the lock again.
You can’t acquire this lock until you’ve completed all currently running modification queries. This might take quite a while, depending on your workload. This means blocking may occur at the start of an online index operation.
At the end of the operation, you have to take a schema-modification lock—think of this as an exclusive lock—to allow it to complete. This also happens extremely fast. Then you drop it immediately. This lock will prevent any type of new queries on the table (read or write) until you grant and release the lock.
Once again, you can’t acquire this lock until SQL has completed all currently running read or write queries. This again means there’s the possibility of blocking.
To summarize, although the name of the feature is online index operations, it does still require two short-term locks that can cause blocking problems. The gain over traditional offline index operations is that for the vast majority of the index operation, there are no locks held, and so overall concurrency is increased. The white paper “Online Indexing Operations in SQL Server 2005” has a lot more details on how these operations work.
Reducing Index Maintenance Time
Q: I’ve inherited some systems where regular index-maintenance jobs take a long time to run and generate lots of IO, but I don’t perform any index rebuilds because the indexes aren’t getting fragmented. I’d like to cut down on the work being done, as I’m not getting any performance gain. Can you recommend a strategy to help?
A: This is a fairly common problem. It stems from the way index-maintenance jobs determine which indexes to rebuild or reorganize.
Most people run the sys.dm_db_index_physical_stats dynamic management function (mentioned earlier) against all indexes in the database, then choose whether to rebuild, reorganize or do nothing. They base this decision on the avg_fragmentation_in_percent, the page_count and the avg_page_space_used_in_percent values using a WHERE clause on the output.
The problem is that index fragmentation is not stored in memory like other statistics. This function must read and process each index to determine the extent of its fragmentation. If most of the indexes in the database are static or change very slowly (in terms of fragmentation), then they won’t be rebuilt or reorganized. Checking their fragmentation every time you run an index-maintenance job is essentially a waste of time.
Most dynamic management views support “predicate push-down,” where the only data processed is that which matches the predicate in the WHERE clause. However, sys.dm_db_index_physical_stats is a function, not a view, so it can’t do this. This means you have to manually filter and only ask the function to process those indexes you know have the potential to be fragmented and may require rebuilding or reorganizing.
I recommend monitoring fragmentation over the course of a few weeks. That way you get an idea of which indexes are worth checking for fragmentation, rather than checking everything. Once you have this list of indexes, create a table with the table name, index name and fragmentation threshold for taking action. You may find that some indexes can have more fragmentation before affecting performance than others. This will be the “driver table” you then use to drive the index-maintenance job. It should loop over all indexes described by the table and only run the sys.dm_db_index_physical_stats function on them.
I’ve implemented this for several clients. In some cases, it has reduced the runtime of the index-maintenance job from hours down to 15 minutes or less. That’s purely from not running this function on static indexes. You could also go a step further and keep track of how often an index is rebuilt and potentially change the index’s FILLFACTOR setting automatically, hopefully leading to a further reduction in the work performed by the index-maintenance job.
For more information on the various methods of performing index maintenance, see my blog post, “Importance of index maintenance,” and for a detailed explanation of what goes on under the covers of the function, also see my blog post, “Inside sys.dm_db_index_physical_stats.”
Thanks to Kimberly L. Tripp of SQLskills.com for her technical review of this month’s column.
Paul S. Randal* is the managing director of SQLskills.com, a Microsoft regional director and a SQL Server MVP. He worked on the SQL Server Storage Engine team at Microsoft from 1999 to 2007. He wrote DBCC CHECKDB/repair for SQL Server 2005 and was responsible for the Core Storage Engine during SQL Server 2008 development. Randal is an expert on disaster recovery, high availability and database maintenance, and is a regular presenter at conferences worldwide. He blogs at SQLskills.com/blogs/paul, and you can find him on Twitter at Twitter.com/PaulRandal.*