Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Here are some notes on "SQL Server 2008 Fragmentation" I took while attending an advanced class on SQL Server taught by Paul Randal https://sqlskills.com/AboutPaulSRandal.asp).
Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.
Types of searches
- Full scans
- Binary searches
- Linear interpolation - fixed record size, monotonely increasing key
Singleton Lookup
- Index Seek / Clustered Index Seek
- Find record using index pages
Allocation Order Scan
- Table Scan or unordered Clustered/Index Scan
- Pages splits during allocation order scans
- Table lock helps (TABLOCK, TABLOCKX) but not required
Advanced scanning
- Also known as "Merry-go-round" scan
- Multiple scans on the same table
- First scan starts, second scan joins in the middle, later scans the rest
- Enterprise edition only
- allocation order scans only
- See https://www.sqlmag.com/Article/ArticleID/49285/sql_server_49285.html
- See https://msdn2.microsoft.com/en-us/library/ms191475.aspx
Range Scan
- Index Scan / Clustered Index Scan
- Find first, then follw next page pointer
Readahead
- Pre-reading of pages in the leaf level during Range Scans
- 1, 8 or 32 pages (8KB, 64KB or 256KB) - only if pages are contiguous
- 128 pages (1024KB) only in Enterprise edition
- Logical fragmentation prevents optimal readahead
- See https://msdn.microsoft.com/en-us/library/ms191475.aspx
Logical Fragmentation
- Next page pointer does not point to next physical page
- DBCC SHOWCONTIG is deprecated now
- Look at avg_fragmentation_in_percent in sys.dm_db_index_physical_stats
- Not relevant if you have a small number of pages (less than 100?)
- Not relevant if all pages are already in the buffer pool (in cache)
- Logical IOs are not necessarily physical IOs
When to do something about it
- When to rebuild/reorganize? It always depends...
- Different for data warehouse or OLTP, for instance
- If you must have a number: 5-30% - ALTER INDEX REORGANIZE
- If you must have a number: >30% - ALTER INDEX REBUILD WITH (ONLINE = ON)*
- See https://msdn.microsoft.com/en-us/library/ms189858.aspx
- Many just rebuild everything every week (could be totally unnecessary)
Extent Fragmentation
- Extents are not contiguous
- Affects readahead, but not much (can read 8 but not 32 pages)
- In a heap, logical fragmentation is actually extent fragmentation
- With multiple files, round robin does not necessarily mean fragmentation
- Interesting KB: https://support.microsoft.com/kb/329526
Page Density
- How much space is actually in use in the page
- Affected by row size (5K fixed-length row leaves 3K unused in every page)
- Affected by page splits
- Affected by FILLFACTOR
- In the end, low page density leads to more IOs
- This is actually a kind of fragmentation
Page Splits
- Page is full and you need to add something (INSERT, UPDATE with larger data)
- Take half the rows to a new page, fix up all the pointers in both
- Could lead to page splits to non-leaf levels, all the way up to the root
- All fully logged, not matter what log level you are in
- Typically creates two low-density pages and are not contiguous
- Does not happen with a heap, only with a clustered index
- Does not happen with a key that is ever increasing (like identity)
Things that cause fragmentation
- GUID as high order key (NEWSEQUENTIALID can help)
- Variable lenght colums
- Misconfigured FILLFACTOR
- Wide rows (with 5K rows being the typical example)
- Clustered indexes that cause random insertion order (sales with customer,date)
FILLFACTOR
- Leave space on pages to avoid page splits in leaf level
- Applies only on rebuild, not when doing INSERT/UPDATE
- PAD_INDEX will do the same thing for non-leaf level
- For data warehouse, no need for FILLFACTOR
- For OLTP, it's hard to calculate the ideal factor, no quick answer
- Look at how many splits, how frequently you can rebuild
- Pick a value, watch how it goes. Then change factor or rebuild frequency
- If you must choose a number, try 70%
Symptoms of at Fragmentation
- Longer run times
- More disk activity (SET STATISTICS IO ON)
- More logging activity
Looking at Fragmentation
- Query sys.dm_db_index_physical_stats (replaces DBCC SHOWCONTIG)
- Look at avg_fragmentation_in_percent for fragmentation
- Look at avg_page_space_used_in_percent for page density
- You can summarize the results, join with sys.indexes to show index names
- See sample query at https://www.sqlskills.com/blogs/paul/post/Can-GUID-cluster-keys-cause-non-clustered-index-fragmentation.aspx
Querying sys.dm_db_index_physical_stats
- It could be expensive, since it bring lots of pages into the buffer pool
- DETAILED (reads everything)
- SAMPLED (reads 1% of the pages if less than 10,000 pages)
- LIMITED (parent level of b-tree, same as DBCC SHOWCONTIG WITH FAST)
- Limited can't tell you page density, since it does not read the page level
Fixing fragmentation
- Three main choices
- Rebuild - ALTER INDEX ... REBUILD (replaces DBCC REINDEX)
- Recreate - CREATE INDEX WITH DROP_EXISTING
- Reorganize - ALTER INDEX ... REORGANIZE (replaces DBCC INDEXDEFRAG)
- DROP, CREATE (separately) not good due to possible issue with constraints
- You can also decide not to do anything at all
- You don't necessarily have to rebuild everything every day...
Rebuild
- Can be done online (unless you have LOB columns)
- Can use multiple CPUs (control with MAXDOP)
- Works single partition or all partitions
- Rebuilds index statistics (equivalent of a full scan)
- Does not update column statistics
- Can be minimally logged
- Atomic. If you interrupt, it rolls back.
- Online - short-hold Shared lock, short-hold SCHema_Mod lock
- Offline clustered - eXclusive lock
- Offline non-clustered - Shared lock
- Creates new before dropping old. Reads from the old index.
- Always rebuilds everything. Faster for largely fragmented indexes.
Online Index Rebuild
- 1) Preparation - New index is created and set to write-only
- very short Shared lock
- 2) Build - Data is inserted from source.
- Two copies of the data being updated
- Scans know they shouldn't use the other - "anti-matter" record
- 3) Final - Index metadata is updated.
- short-hold SCHema_Mod lock
- Old data is dropped
- Details at https://msdn.microsoft.com/en-us/library/ms191261.aspx
Recreate
- Basically same as rebuild
- Can move to a new location
- Can change the schema
- Good way to do a shrink without using shrink
Reorganize
- Is always online (even if you have LOB columns)
- Always single threaded
- Works single partition or all partitions
- Table IX lock - locks only TABLOCKs and escalated locks
- Addresses only existing fragmentation
- Faster for lightly fragemented indexes
- Does not use much extra space
Reorganize Phase 1 - Page compaction
- Moves data from neighbor pages to get closer to FILLFACTOR
- Leaf level only. Works on a sliding window (a few pages at a time)
- Tries eXclusive lock on the pages involved. If can't, moves on.
- If it can empty pages, it will deallocated them.
- It will also remove ghost records.
Reorganize Phase 2 - Page defragment
- Make logical order the same as the allocation order
- Leaf level only. Shuffle pages arounds without using extra space.
- Uses one extra free page (page type 19 - unlinked reorg page)
- Locks enough pages to get the operation done without blocking others.
- Reminded me of the "towers of hanoi" problem :-)
SQL Server Best Practices Article - Predeployment I/O Best Practices
https://technet.microsoft.com/en-us/library/cc966412.aspx
Comments
- Anonymous
January 01, 2003
PingBack from http://servercoach.com/?p=1059