Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Here are some notes on "SQL Server 2008 Database Checking" I took while attending an advanced class on SQL Server taught by Paul Randal (https://sqlskills.com/AboutPaulSRandal.asp).
Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.
Checking databases
- Why check for corruption?
- What to do when corruption happens?
- Many people don't prepare for it and panic.
- Monitor with agent alerts
IO Errors
- 823 - hard I/O error - sql never got the data, just an error code
- 824 - soft I/O error - sql got some data, but it came with error code
- 825 - read/retry error
- msdb..suspect_pages has a list
Read-Retry
- SQL has retried a few times, error 825 goes to the log
- Your I/O subsystems has transient problems
- Eventual failure results in an IO error
- Since SQL Server 2005, this was extended to data pages
- This is not logged as an error, but you should keep an eye on it
SQLIOSim
- Not a performance analysis tool.
- This is to stress to system, before you introduce SQL, to check for problems
- Documented in blog posts
Page protection options
- Set using ALTER PAGE SET PAGE_VERIFY NONE/CHECKSUM/TORN_PAGE_DETECTION
- TORN_PAGE_DETECTION - First two bits of each 512 bytes in the page
- Alternating bit pattern helps identify data corruption condition
CHECKSUM
- Simple checksum of all the bits on the page in page header
- Enough to detect data corruption condition on the page (new in SQL 2005)
- Last thing done to page before write, first thing checked after read
- Always recommended to have at least one of them.
- CHECKSUM has a CPU overhead, but it is recommended.
- Checked: when page is read, during CHECKDB, BACKUP with CHECKSUM
- Available in tempdb starting with SQL Sever 2008
Automatic page repair
- Mirroring is based on the log, so it does not mirror physical corruption
- If you see a 823/824 error occurs and database is mirrored
- Corrupt pages on the principal and mirror can be repaired
- Principal and mirrored must be synchronized
- Repairs are asynchronous, corrupt pages are unusable until repaired
- "Database suspect data page" event class
- Monitor last 100 corrupted pages with sys.dm_db_mirroring_auto_page_repair
DBCC CHECKDB
- Checks the database. Only way to force reading all allocated pages in the DB.
- May cause contention, but does not take any locks. Runs online by default.
- 3 ways: CHECKDB WITH PHYSICAL_ONLY / CHECKDB / CHECKDB WITH EXTENDED_LOGICAL_CHECKS
Evolution of CHECKDB
- In SQL 2000, introduced many ways to minimize runtime and run online
- In SQL 2005, progress report, data purity, indexed view, last known good...
- In SQL 2008, long running checks moved under WITH EXTENDED_LOGICAL_CHECKS
- That's why, by default, SQL 2008 runs DBCC CHECKDB faster than SQL 2005
- CHECKDB will use a lot of resources, CPU, tempdb, etc. Does not use log.
Running CHECKDB
- By default, only returns the first 200 errors
- Could return lot of distracting informational messages
- Use DBCC CHECKDB WITH ALL_ERRORMSG, NO_INFOMSGS
- If it's taking too longer than usual, it could just be that it found some repairable corruption (don't assume the worst)
- Additional messages in SQL 2008 to inform during run that it found corruption
CHECKDB checks
- If it can't read the system tables, it cannot do much
- Allocation checks/repairs - CHECKALLOC
- Logical checks/repairs for system tables - Checks done by page in allocation order
- Logical checks/repairs for all other tables
- Service broker data validations
- Metadata checks - CHECKCATALOG
- Indexed views and XML index cheks (if WITH EXTENDED_LOGICAL_CHECKS)
- WITH PHYSICAL ONLY - simpler check of system table, stops there
- See https://sqlskills.com/BLOGS/PAUL/post/CHECKDB-From-Every-Angle-Complete-description-of-all-CHECKDB-stages.aspx
How frequently to run CHECKDB
- Depends on a number of things:
- - stability of IO subsystem, backup strategy, downtime SLA, data loss SLA,
- - can take additional CPU/IO, type of system (production/test/backup)
- At least once a week
- If you're running as part of maitanence, do it before the backup
How long does it take?
- Depends on a number of things
- - size of DB, current server load (CPU/IO), update activity
- - number of CPUs, speed of tempdb disks, complexity of schema
- - which options you use...
CHECKDB on VLDB
- CHECKDB on a multi-TB database can take hours
- Five options to reduce time
- - Don't run it :-)
- - Run WITH PHYSICAL_ONLY
- - Break up the checks
- - Use partitioning - run CHECKFILEGROUP
- - Use another system - use a BACKUP
- Don't give up!
- See https://sqlskills.com/BLOGS/PAUL/post/CHECKDB-From-Every-Angle-Consistency-Checking-Options-for-a-VLDB.aspx
What to do?
- Do not panic!
- Check the extent of the problem - run full CHECKDB on another system
- Verify what backups are available while it's running full checks
- Find out for how long has that being going
- Might be able to fix online with a restore
Repairable?
- Repairable errors
- - Errors in non-clustered indexes - Rebuild indexes, run CHECKDB again
- Unrepairable errors
- - Errors can prevent CHECKDB from running or doing repairs
- - When you run DBCC CHECKCATALOG tells to run DBCC CHECKCATALOG
Restore or Repair?
- Did CHECKDB fail? Is it a repairable error? Cannot repair...
- Do you have a backup? How old is it? Is the transaction log damaged?
- What will give you the least data loss in the shortest amount of data
- You can do single page restore from your backup (if you know what you're doing)
Repair
- Repairs are part of CHECKDB: REPAIR_FAST, REPAIR_REBUILD, REPAIR_ALLOW_DATA_LOSS
- Repairs ranked - First the most intrusive things to repair
- Repairs are never be online - it's hard enough to do it offline :-)
- Beware of REPAIR_ALLOW_DATA_LOSS - it does what it says
- If log is damaged, you can try emergency mode - no consistency guarantee
- Demo: Repairing a suspect databases
Lessons learned
- Backup your data frequently
- Have multiple backups, keep some of them off-site
- Validate that your backups are good by doing trial restores to alternate server
- Run CHECKDB regularly if possibly in your case, know how long it regularly takes
Books Online: DBCC CHECKDB
https://msdn.microsoft.com/en-us/library/ms176064.aspx
Comments
- Anonymous
January 01, 2003
I took some advanced SQL Server 2008 classes recently and shared my notes in a series of blog posts.