Disk in Depth - PFE Performance Guide

아티클
1/17/2024

Key disk performance metrics are as follows:

Avg Disk Sec/Read - A value expressed in milliseconds. So a 1 second sample in Perfmon will wield the average duration of reads during this 1 second interval and give the averaged latency in milliseconds.

Avg Disk Sec/Write - A value again expressed in milliseconds. Same as above but expressed as writes not reads.

Generally speaking these two values should be below 15 ms always in a production IT environment. These thresholds however assume a transfer size of 64KB or smaller. If you are moving larger transfers, you need to adjust your expectations. This is where considerations like Disk Bytes/sec and Disk Transfers/sec come into play.

PAL can be used to apply the 'grain of salt' rule these IO profiles automatically.

Now back to Jeff Stokes' tirade on disk stuff that doesn't matter as much, but is still worth reading if you have time...

Any discussion on disk performance cannot be complete with a discussion on physical performance characteristics. Most of the important or relevant metrics are listed by the manufacturer for product comparison. Avg Seek Time, Avg Write Time and Avg Read Time; along with the amount of cache on the internal controller, rotational speed, technologies supported such as Native Command Queuing and physical characteristics such as interface, size of the platters, etc.

For example, lets look at the characteristics from Seagate for their commercial desktop drive, the ST320005N1A1AS-RK.

Spin Speed (RPM)	7200 RPM
Sustained data transfer rate	138Mb/s
Average latency	4.16ms
Random read seek time	8.5ms
Random write seek time	9.5ms
I/O data transfer rate	600MB/s

This disk supports a 6Gb/sec SATA interface, has 64 MB of on-board cache and has a 2 TB capacity. The rotational speed is 7200 RPM. While the advertised interface speed is 6 Gb/sec, the sustained data transfer rate is 'only' 138 Mb/sec. This is the number you really want to be paying attention to in terms of load the disk can handle.

Observe the latency, 4.16 ms for Average Latency, Random (worst case typically) read and write seek times are both below 10 ms.

So lets talk about what these mean in terms of Windows host performance.

Average latency means that on average, the head needs this amount of time to place the desired sector under the disk head, AFTER the seek is complete.

Read and Write seek times are the amount of time it took the actuator to move the head to the correct cylinder for a read or write operation.

So with this being said, or, well, written, how long does a disk transfer take, on average? If it's a write, 13.66 ms on average, reads 12.66 ms.

Clearly this is not server class hardware we are talking about, as anything above 15 ms latency for Avg Disk Sec/Read and Avg Disk Sec/Write is headed for trouble.

So with that, lets look at an enterprise level disk break down, the Seagate ST3600057SS:

Spin Speed (RPM)	15,000 RPM
Average latency	2.0ms
Random read seek time	3.4ms
Random write seek time	3.9ms
I/O data transfer rate	600MB/s

It should be a clear observation that most of the performance gains are due to the increased rotational rate of the drive. Note the I/O data transfer rate for this enterprise class disk is identical to the consumer desktop grade disk. But you are more than halving the latency of a disk transfer, you are clearly in a much better place in terms of disk performance for enterprise class applications/back office products.

These disks are both 3.5 inch disks. Lets now see what the rates look like when we compare to the near identical (but smaller capacity) Seagate ST9146852SS that are 2.5 inch.

Spin Speed (RPM)	15,000 RPM
Average latency	2.0ms
Random read seek time	2.9ms
Random write seek time	3.3ms
I/O data transfer rate	600MB/s

What else is different besides the slightly lower seek times and the smaller platter/GB size? Heat creation from friction of the platters spinning against air molecules. Also think in terms of transfer rates, the disk actuator has less work to do moving the head, so that is why seeks are a little quicker.

So what does this mean, in terms of Windows (or any OS for that matter) performance? Even with the best in class disk listed above, writes are going to take on average random access 5.3 ms, reads are 4.9 ms.

Physical Layout

Now lets think in terms of Physical layout of the disk. Inside a drive, we have multiple platters set onto a spindle. The spindle spins the disk around at a certain RPM. The actuator arm and actuator motor control the location of the heads that interact with the data on the platters.

So logically, access speeds are quicker in the outer ring of the disk than they are in the inner track. Why? Because we can stuff more sectors into the outer ring of the disk, where the diameter is greatest and the inner ring of the platter has the fewest count of sectors. Most modern disks are now broken into sections. So imagine splitting the platters into stadium seating at a concert. There are more seats in the nose bleed section than front row. We might have three sections on each platter. Nose bleeds is actually where you want your data to sit. One revolution of the spindle will access far more sectors in this area than in the front row/small diameter section of the platter. And in the middle is well, a middle ground.

Zoned-Bit Recording

So think about this a moment. As the disk gets utilized and stores data, it's laying that data down in the outer section first, then middle, then finally inner section. So if you have say, a 300 GB disk, roughly 150 GB will be stored in the outer section, 100 GB in the middle section and finally 50 GB in the inner section. With this architecture, sequential (and to a lesser extent, random) reads and writes in the outer section will occur at a higher rate than those in the middle and even more so than those in the inner section. Why, again, because the head is passing over more physical area of the disk on the outer tracks than the inner tracks so it can potentially do more reads and writes per revolution on the outer section as opposed to the inner section.

There is a further inefficiency besides less data transferred per revolution in the inner track. The penalty of the time it takes the actuator to move the head to the inner tracks to service the (likely) smaller request is piled on top of the fact that while the head is down there it can't be servicing more efficient requests in the outer section.

Some SAN vendors will burn the inner tracks of the disk, the inner section, and not allow it to be written to. While you don't gain the full use of your storage with this method, you enhance the performance experienced by the data that does reside on the platters by doing this.

Physical Characteristics of Solid State Drives

Solid State Drives (SSDs) present a unique performance gain. They also bring with them some trade offs in terms of reliability in my opinion. The common SSD is constructed of one or more memory controllers interfacing multiple layered memory cells to a SATA interface. In some cases they are internally RAID 0 devices for multiple controllers to speed access. These MLC drives are quick to read and slower to write, as normal drives are as well.

The problem with these devices is their lifespan. MLC cells can only sustain approximately 10,000 writes before they oxidize to the point that they can no longer reliably hold data. So when a write operation occurs a verify takes place and when it fails verify, the cell is marked bad and is back filled from a bank kept in reserve by the vendor. Most drives will also pull from non-formatted area as well, so you can short stroke the drive and extend lifespan of your volume by partitioning the say, a 120 GB drive with a 100 GB volume and allowing the drive to back-fill the bad blocks for a longer duration.

With physical attributes of the disk being explained, lets move on to technologies.

The first would be TCQ, or Tagged Command Queuing. TCQ (and NCQ for that matter) are technologies that optimizes disk traffic based on location. So if I slam 10 IO requests in a row into the disk it queues these requests and checks to see if any of them are close by in terms of physical locale and handles them together. Sort of like batch processing for disk access. Without this technology, the IO operations are handled in a first-in/first-out method regardless of location.

StorPort Tracing
If there is a need to get data about the round trip response times of I/O request packets (IRP) between the lowest portion of Microsoft code to the driver of a Host Bus Adapter (HBA), then consider storport tracing. For more information on this topic, see How to look at Storport ETW Trace logs?

다음을 통해 공유

Disk in Depth - PFE Performance Guide

추가 리소스