Linux direct I/O best practices for Azure NetApp Files
This article helps you understand direct I/O best practices for Azure NetApp Files.
Direct I/O
The most common parameter used in storage performance benchmarking is direct I/O. It's supported by FIO and Vdbench. DISKSPD offers support for the similar construct of memory-mapped I/O. With direct I/O, the filesystem cache is bypassed, operations for direct memory access copy are avoided, and storage tests are made fast and simple.
Using the direct I/O parameter makes storage testing easy. No data is read from the filesystem cache on the client. As such, the test is truly stressing the storage protocol and service itself, rather than memory access speeds. Without the DMA memory copies, read and write operations are efficient from a processing perspective.
Take the Linux dd
command as an example workload. Without the optional odirect
flag, all I/O generated by dd
is served from the Linux buffer cache. Reads with the blocks already in memory aren't retrieved from storage. Reads resulting in a buffer cache miss end up being read from storage using NFS read-ahead with varying results, depending on factors as mount rsize
and client read-ahead tunables. When writes are sent through the buffer cache, they use a write-behind mechanism, which is untuned and uses a significant amount of parallelism to send the data to the storage device. You might attempt to run two independent streams of I/O, one dd
for reads and one dd
for writes. But in fact, the operating system, untuned, favors writes over reads and uses more parallelism of it.
Aside from database, few applications use direct I/O. Instead, they leverage the advantages of a large memory cache for repeated reads and a write behind cache for asynchronous writes. In short, using direct I/O turns the test into a micro benchmark if the application being synthesized uses the filesystem cache.
The following are some databases that support direct I/O:
- Oracle
- SAP HANA
- MySQL (InnoDB storage engine)
- RocksDB
- PostgreSQL
- Teradata
Best practices
Testing with directio
is an excellent way to understand the limits of the storage service and client. To better understand how the application behaves (if the application doesn't use directio
), you should also run tests through the filesystem cache.