SQL Server checkpoint problems

On high end performance servers with a huge amount of memory you might notice some spikes on I/O activity due checkpoint executions as examplified below:

checkpoint

On SQL Server error log after enabling trace flags 3504 and 3502 to enable checkpoint verbose logging you might notice:

(…)

2008-11-13 16:10:19.57 spid13s Ckpt dbid 5 started (0)

2008-11-13 16:10:19.57 spid13s About to log Checkpoint begin.

2008-11-13 16:10:20.41 spid13s Ckpt dbid 5 phase 1 ended (0)

2008-11-13 16:10:30.17 spid13s FlushCache: cleaned up 192890 bufs with 32627 writes in 9760 ms (avoided 4882 new dirty bufs)

2008-11-13 16:10:30.17 spid13s average throughput: 154.40 MB/sec, I/O saturation: 5449

2008-11-13 16:10:30.17 spid13s last target outstanding: 2586

2008-11-13 16:10:30.17 spid13s About to log Checkpoint end.

2008-11-13 16:10:30.18 spid13s Ckpt dbid 5 complete

2008-11-13 16:10:30.24 Backup Log was backed up. Database: PRD, creation date(time): 2008/04/15(19:53:05), first LSN: 8931:5148940:1, last LSN: 8931:8029018:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {'F:\PRD\BACKUP\LS_SOURCE\PRD_20081113151016.trn'}). This is an informational message only. No user action is required.

2008-11-13 16:11:30.17 spid13s Ckpt dbid 5 started (0)

2008-11-13 16:11:30.17 spid13s About to log Checkpoint begin.

2008-11-13 16:11:31.27 spid13s Ckpt dbid 5 phase 1 ended (0)

2008-11-13 16:11:37.01 spid13s FlushCache: cleaned up 185883 bufs with 31357 writes in 5740 ms (avoided 575 new dirty bufs)

2008-11-13 16:11:37.01 spid13s average throughput: 253.00 MB/sec, I/O saturation: 3076

2008-11-13 16:11:37.01 spid13s last target outstanding: 8800

2008-11-13 16:11:37.01 spid13s About to log Checkpoint end.

2008-11-13 16:11:37.01 spid13s Ckpt dbid 5 complete

(…)

To avoid these spikes you can apply -k startup parameter as described in https://support.microsoft.com/kb/929240 - FIX: I/O requests that are generated by the checkpoint process may cause I/O bottlenecks if the I/O subsystem is not fast enough to sustain the IO requests in SQL Server 2005

In case you notice that backups/ transaction log backups are being blocked for too long by checkpoint check disk write latency because checkpoint algorithm is dynamic and automatic throttling will be done.

On SQL Server checkpoint has been enhanced to do automatic throttling. Automatic throttling will be done for all background and implicit checkpoints, as well as manual checkpoints that do not specify a duration.

With automatic throttling, checkpoint will measure the impact that it has on I/O latency, and will automatically adjust the amount of outstanding checkpoint I/O in order to keep the overall latency from being unduly affected. This will minimize the impact that checkpoint has on foreground applications. If checkpoint detects I/O latency higher than 20 ms checkpoint will automatic throttling I/O requests submitted by checkpoint.