SQL Server 2008 Database Checking

2009-03-20

Here are some notes on "SQL Server 2008 Database Checking" I took while attending an advanced class on SQL Server taught by Paul Randal (https://sqlskills.com/AboutPaulSRandal.asp).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.

Checking databases

Why check for corruption?
What to do when corruption happens?
Many people don't prepare for it and panic.
Monitor with agent alerts

IO Errors

823 - hard I/O error - sql never got the data, just an error code
824 - soft I/O error - sql got some data, but it came with error code
825 - read/retry error
msdb..suspect_pages has a list

Read-Retry

SQL has retried a few times, error 825 goes to the log
Your I/O subsystems has transient problems
Eventual failure results in an IO error
Since SQL Server 2005, this was extended to data pages
This is not logged as an error, but you should keep an eye on it

SQLIOSim

Not a performance analysis tool.
This is to stress to system, before you introduce SQL, to check for problems
Documented in blog posts

Page protection options

Set using ALTER PAGE SET PAGE_VERIFY NONE/CHECKSUM/TORN_PAGE_DETECTION
TORN_PAGE_DETECTION - First two bits of each 512 bytes in the page
Alternating bit pattern helps identify data corruption condition

CHECKSUM

Simple checksum of all the bits on the page in page header
Enough to detect data corruption condition on the page (new in SQL 2005)
Last thing done to page before write, first thing checked after read
Always recommended to have at least one of them.
CHECKSUM has a CPU overhead, but it is recommended.
Checked: when page is read, during CHECKDB, BACKUP with CHECKSUM
Available in tempdb starting with SQL Sever 2008

Automatic page repair

Mirroring is based on the log, so it does not mirror physical corruption
If you see a 823/824 error occurs and database is mirrored
Corrupt pages on the principal and mirror can be repaired
Principal and mirrored must be synchronized
Repairs are asynchronous, corrupt pages are unusable until repaired
"Database suspect data page" event class
Monitor last 100 corrupted pages with sys.dm_db_mirroring_auto_page_repair

DBCC CHECKDB

Checks the database. Only way to force reading all allocated pages in the DB.
May cause contention, but does not take any locks. Runs online by default.
3 ways: CHECKDB WITH PHYSICAL_ONLY / CHECKDB / CHECKDB WITH EXTENDED_LOGICAL_CHECKS

Evolution of CHECKDB

In SQL 2000, introduced many ways to minimize runtime and run online
In SQL 2005, progress report, data purity, indexed view, last known good...
In SQL 2008, long running checks moved under WITH EXTENDED_LOGICAL_CHECKS
That's why, by default, SQL 2008 runs DBCC CHECKDB faster than SQL 2005
CHECKDB will use a lot of resources, CPU, tempdb, etc. Does not use log.

Running CHECKDB

By default, only returns the first 200 errors
Could return lot of distracting informational messages
Use DBCC CHECKDB WITH ALL_ERRORMSG, NO_INFOMSGS
If it's taking too longer than usual, it could just be that it found some repairable corruption (don't assume the worst)
Additional messages in SQL 2008 to inform during run that it found corruption

CHECKDB checks

If it can't read the system tables, it cannot do much
Allocation checks/repairs - CHECKALLOC
Logical checks/repairs for system tables - Checks done by page in allocation order
Logical checks/repairs for all other tables
Service broker data validations
Metadata checks - CHECKCATALOG
Indexed views and XML index cheks (if WITH EXTENDED_LOGICAL_CHECKS)
WITH PHYSICAL ONLY - simpler check of system table, stops there
See https://sqlskills.com/BLOGS/PAUL/post/CHECKDB-From-Every-Angle-Complete-description-of-all-CHECKDB-stages.aspx

How frequently to run CHECKDB

Depends on a number of things:
- stability of IO subsystem, backup strategy, downtime SLA, data loss SLA,
- can take additional CPU/IO, type of system (production/test/backup)
At least once a week
If you're running as part of maitanence, do it before the backup

How long does it take?

Depends on a number of things
- size of DB, current server load (CPU/IO), update activity
- number of CPUs, speed of tempdb disks, complexity of schema
- which options you use...

CHECKDB on VLDB

CHECKDB on a multi-TB database can take hours
Five options to reduce time
- Don't run it :-)
- Run WITH PHYSICAL_ONLY
- Break up the checks
- Use partitioning - run CHECKFILEGROUP
- Use another system - use a BACKUP
Don't give up!
See https://sqlskills.com/BLOGS/PAUL/post/CHECKDB-From-Every-Angle-Consistency-Checking-Options-for-a-VLDB.aspx

What to do?

Do not panic!
Check the extent of the problem - run full CHECKDB on another system
Verify what backups are available while it's running full checks
Find out for how long has that being going
Might be able to fix online with a restore

Repairable?

Repairable errors
- Errors in non-clustered indexes - Rebuild indexes, run CHECKDB again
Unrepairable errors
- Errors can prevent CHECKDB from running or doing repairs
- When you run DBCC CHECKCATALOG tells to run DBCC CHECKCATALOG

Restore or Repair?

Did CHECKDB fail? Is it a repairable error? Cannot repair...
Do you have a backup? How old is it? Is the transaction log damaged?
What will give you the least data loss in the shortest amount of data
You can do single page restore from your backup (if you know what you're doing)

Repair

Repairs are part of CHECKDB: REPAIR_FAST, REPAIR_REBUILD, REPAIR_ALLOW_DATA_LOSS
Repairs ranked - First the most intrusive things to repair
Repairs are never be online - it's hard enough to do it offline :-)
Beware of REPAIR_ALLOW_DATA_LOSS - it does what it says
If log is damaged, you can try emergency mode - no consistency guarantee
Demo: Repairing a suspect databases

Lessons learned

Backup your data frequently
Have multiple backups, keep some of them off-site
Validate that your backups are good by doing trial restores to alternate server
Run CHECKDB regularly if possibly in your case, know how long it regularly takes

Books Online: DBCC CHECKDB
https://msdn.microsoft.com/en-us/library/ms176064.aspx

Comments

Anonymous
January 01, 2003
I took some advanced SQL Server 2008 classes recently and shared my notes in a series of blog posts.

Share via

SQL Server 2008 Database Checking

Comments

Additional resources