SQL 2016 - It Just Runs Faster: Indirect Checkpoint Default
There are two(2) distinct checkpoint paths provided starting with SQL Server 2014, referred to as Automatic and Indirect.
The vast majority of documentation today highlights the behavior of automatic (classic) checkpoint. This post outlines some historical aspects of checkpoint and provides the recommendation to leverage Indirect Checkpoint.
Before SQL Server 7.0
The database pages in SQL Server 6.5 and prior versions where tracked on various linked lists and as log records were generated a fixed value is assigned as a recovery target.
Formula: (Log records * fixed time to recover) > configured recovery interval = Enqueue automatic checkpoint.
A list of in-memory pages was traversed, creating a secondary reference list containing the dirty pages in page id, ascending order. The dirty list was then used to write the database pages to disk. One of the problems uncovered during SQL Server 6.5 and prior versions was hardware, elevator seek. The hardware would favor servicing requests near each other. The page id, sorted ordering of checkpoint could trigger the hardware condition, starving other I/O requests to the same media.
A change to leverage Scatter/Gather I/O capabilities in SQL Server 6.5 SP3 provided some relief for elevator seek.
SQL Server 7.0
SQL Server 7.0 changed how the database pages are tracked and held in SQL Server memory (the buffer pool.) The design moved to an array that could easily be traversed by ordinal position. SQL Server 7.0 no longer tracked or built a separate dirty, page list.
The formula used to trigger the checkpoint remained mostly unchanged - Formula: (Log records * fixed time to recover) > configured recovery interval = Enqueue automatic checkpoint.
The logic for checkpoint (FlushCache) changed. Instead of looking over all pages in memory and building a list the BUF array is swept from beginning to end. When a dirty page is found it is written to disk along with the block of dirty pages near it. The design takes full advantage of WriteFileGather, avoiding hardware elevator seek because the pages are randomly distributed in memory.
SQL Server 7.0 through SQL Server 2008 gathered 16 dirty pages, whenever possible, by page id and flushed them in a single write. SQL Server 2012 and newer versions gather 32 dirty pages, whenever possible. For SQL 2016 that value has been increased to 128 pages.
For example page 1:13 and 1:12 are dirty resulting in a single WriteFileGather operation.
Is BUF Dirty - No - Skip
Is BUF Dirty - Yes - Write with gather near
Loop until we have 16/32 pages or no more dirty as a contiguous block to gather
Is Page 1:14 in memory (hashed) and dirty NO - End forward search for near pages
Is Page 1:12 in memory (hashed) and dirty YES - Include in write request
Is Page 1:11 in memory (hashed) and dirty NO - End of backward search for near pages
Write pages 1:12 and 1:13 in a single (WriteFileGather) block
Is BUF Dirty - No Skip
Is BUF Dirty - Yes - Already in I/O - Skip
This method is highly effective and removed the sorting by page order that closely aligns to physical storage, defeating hardware, elevator seek problems.
As SQL Server 2005 to SQL Server 2014 evolved so did various checkpoint pacing targets. Default will pace by I/O responsiveness based on heuristics for spinning media (DAS) response times. You can issue CHECKPOINT with additional targets.
Indirect was introduced in SQL Server 2012. Indirect combines designs from previous checkpoint implementations.
Indirect checkpoint is the recommended configuration, especially on systems with large memory footprints and default for databases created in SQL Server 2016.
There has always been a need to track which pages are dirty. The determination of dirty pages has been extended over the years and SQL Server releases to include a 'PrepareToDirty' routine inside SQL Server. This facilitates capabilities such as snapshot databases and page latch enforcement. For example, before any code in the SQL Server is allowed to make changes it must invoke 'PrepareToDirty.' If a database snapshot is present for the database and the page has not been copied to the snapshot the image of the page is secured before allowing the change.
Indirect checkpoint leverages 'PrepareToDirty' logic. For indirect checkpoint, the SQL Server creates a dirty page manager for the database. Within the dirty page manager exists a set of lists to track dirty pages. As pages become dirty they are added to the highly efficient, partitioned lists which are sorted by LSN and tend to just be append list operations. Adding the pages to partitioned, sorted lists when the page is first dirtied is very low overhead.
As log records are added to the database the databases (TARGET_RECOVERY_TIME) is used to evaluate the need to checkpoint the database. A similar formula is used to look at the number of dirty pages, log records produced time to recovery is used to trigger the checkpoint.
The indirect checkpoint swaps the current set of dirty partition lists, sorts the lists in page id ascending order and leverages the WriteFileGather behavior to bundle a dirtied block or pages as discussed in the SQL Server 7.0 example above.
By swapping out the lists other queries can continue to dirty pages without list maintenance collisions. The newer media types (SSD, flash, etc..) and disk drives don't exhibit the hardware elevator seek behavior so sorting by page order and using gather activities improves the I/O throughput capabilities.
Before indirect checkpoint the database, dirty page manager tracks a list of dirty pages in random order.
When indirect checkpoint is triggered the dirty, page list is pulled and sorted locally (output list) by page number and the dirty page manager starts with an empty list for the next checkpoint generation.
Indirect Checkpoint Recommended
Indirect checkpoint leverages the best of all previous designs. As the RAM memory footprint grows the automatic (classic) checkpoint scalability tends to languish. For example, creation of a new, default sized database on a 4TB system can take minutes (automatic) vs seconds (indirect.)
During creation of a new database a couple of checkpoints are required. In automatic checkpoint mode (classic) the entire set of buffers has to be evaluated for possible dirty pages belonging to the database where as indirect deals with only those buffers dirtied.
4TB System = ~500 million SQL Server BUF structures
Simple, default, new database creation dirties ~ 250 BUF structures
For example purposes let's assume it takes 5 CPU cycles to check a buffer for DIRTY status. At 500 million BUFs this results in 2.5 trillion CPU cycles. Of those 2.5 trillion cycles only 250 are positive, dirty buffers. Take this times 2x for the 2 checkpoint activities and you extend this to 5 trillion cycles. The performance savings of indirect checkpoint is plain to see and why indirect checkpoint is recommended on systems with larger memory footprints and becomes the default for databases created in SQL Server 2016.
'It Just Runs Faster' - New databases in SQL Server 2016 use indirect checkpoint, improving performance of checkpoint activities.
Ryan Stonecipher - Principle SQL Server Software Engineer
Peter Byrne - Principle SQL Server Software Engineer
Bob Dorr - Principal SQL Server Software Engineer