How can I leverage the sys.dm_io_virtual_file_stats to determine is there any IO perf problem?
Well, since you see that message in the errorlog, you already know that you have an IO perf problem....
But apart from there, here is a stored procedure that I use for this purpose:
CREATE OR ALTER PROCEDURE VFSMonitor AS
SET NOCOUNT ON
IF object_id('VFSData') IS NULL
BEGIN
CREATE TABLE VFSData (
sample_time datetime NOT NULL,
database_id int NOT NULL,
file_id int NOT NULL,
sample_length_sec int NULL,
num_of_reads bigint NULL,
num_of_bytes_read bigint NULL,
io_stall_read_ms bigint NULL,
num_of_writes bigint NULL,
num_of_bytes_written bigint NULL,
io_stall_write_ms bigint NULL,
io_stall bigint NULL,
delta_size_on_disk bigint NULL,
tot_num_of_reads bigint NOT NULL,
tot_num_of_bytes_read bigint NOT NULL,
tot_io_stall_read_ms bigint NOT NULL,
tot_num_of_writes bigint NOT NULL,
tot_num_of_bytes_written bigint NOT NULL,
tot_io_stall_write_ms bigint NOT NULL,
tot_io_stall bigint NOT NULL,
size_on_disk_bytes bigint NOT NULL,
CONSTRAINT pk_VFSData PRIMARY KEY (sample_time, database_id, file_id)
)
CREATE INDEX database_ix ON VFSData(database_id, file_id)
END
DECLARE @latest datetime,
@now datetime
SELECT @latest = MAX(sample_time) FROM VFSData
SELECT @now = getdate()
INSERT INTO dbo.VFSData
(sample_time, database_id, file_id, sample_length_sec,
num_of_reads, num_of_bytes_read,
io_stall_read_ms, num_of_writes, num_of_bytes_written,
io_stall_write_ms, io_stall, delta_size_on_disk,
tot_num_of_reads, tot_num_of_bytes_read,
tot_io_stall_read_ms, tot_num_of_writes, tot_num_of_bytes_written,
tot_io_stall_write_ms, tot_io_stall, size_on_disk_bytes)
SELECT @now, fs.database_id, fs.file_id, datediff(ss, v.sample_time, @now),
fs.num_of_reads - v.tot_num_of_reads,
fs.num_of_bytes_read - v.tot_num_of_bytes_read,
fs.io_stall_read_ms - v.tot_io_stall_read_ms,
fs.num_of_writes - v.tot_num_of_writes,
fs.num_of_bytes_written - v.tot_num_of_bytes_written,
fs.io_stall_write_ms - v.tot_io_stall_write_ms,
fs.io_stall - v.tot_io_stall,
fs.size_on_disk_bytes - v.size_on_disk_bytes,
fs.num_of_reads,
fs.num_of_bytes_read,
fs.io_stall_read_ms,
fs.num_of_writes,
fs.num_of_bytes_written,
fs.io_stall_write_ms,
fs.io_stall,
fs.size_on_disk_bytes
FROM sys.dm_io_virtual_file_stats(NULL, NULL) fs
LEFT JOIN VFSData v ON fs.database_id = v.database_id
AND fs.file_id = v.file_id
AND v.sample_time = @latest
Then I schedule to run every 10 or 15 minutes.
Then I query it something like this:
WITH VFStimes AS (
SELECT database_id, file_id, sample_time, num_of_reads, num_of_bytes_read, io_stall_read_ms,
convert(decimal(20,1), 1E0*io_stall_read_ms / nullif(num_of_reads, 0)) AS read_time,
num_of_writes, num_of_bytes_written, io_stall_write_ms,
convert(decimal(20,1), 1E0*io_stall_write_ms / nullif(num_of_writes, 0)) AS write_time
FROM VFSData
)
SELECT *
FROM VFStimes
WHERE read_time > 10 OR write_time > 10
The idea with the filter is that access times below 10 ms are OK. Although, that limit is set with a spinning disk in mind; for an SSD, you may expect sub-millisecond times. Anyway, you can adjust the filter as you like.
The idea is that the query will show you periods where you experienced slow disk access, and on which database and files.