Linux direct I/O best practices for Azure NetApp Files

This article helps you understand direct I/O best practices for Azure NetApp Files.

Direct I/O

The most common parameter used in storage performance benchmarking is direct I/O. It is supported by FIO and Vdbench. DISKSPD offers support for the similar construct of memory-mapped I/O. With direct I/O, the filesystem cache is bypassed, operations for direct memory access copy are avoided, and storage tests are made fast and simple.

Using the direct I/O parameter makes storage testing easy. No data is read from the filesystem cache on the client. As such, the test is truly stressing the storage protocol and service itself, rather than memory access speeds. Also, without the DMA memory copies, read and write operations are efficient from a processing perspective.

Take the Linux dd command as an example workload. Without the optional odirect flag, all I/O generated by dd is served from the Linux buffer cache. Reads with the blocks already in memory are not retrieved from storage. Reads resulting in a buffer cache miss end up being read from storage using NFS read-ahead with varying results, depending on factors as mount rsize and client read-ahead tunables. When writes are sent through the buffer cache, they use a write-behind mechanism, which is untuned and uses a significant amount of parallelism to send the data to the storage device. You might attempt to run two independent streams of I/O, one dd for reads and one dd for writes. But in fact, the operating system, untuned, favors writes over reads and uses more parallelism of it.

Aside from database, few applications use direct I/O. Instead, they choose to leverage the advantages of a large memory cache for repeated reads and a write behind cache for asynchronous writes. In short, using direct I/O turns the test into a micro benchmark if the application being synthesized uses the filesystem cache.

The following are some databases that support direct I/O:

  • Oracle
  • SAP HANA
  • MySQL (InnoDB storage engine)
  • RocksDB
  • PostgreSQL
  • Teradata

Best practices

Testing with directio is an excellent way to understand the limits of the storage service and client. To get a better understanding for how the application itself will behave (if the application doesn't use directio), you should also run tests through the filesystem cache.

Next steps