Linux NFS mount options best practices for Azure NetApp Files
This article helps you understand mount options and the best practices for using them with Azure NetApp Files.
nconnect mount option allows you to specify the number of connections (network flows) that should be established between the NFS client and NFS endpoint up to a limit of 16. Traditionally, an NFS client uses a single connection between itself and the endpoint. By increasing the number of network flows, the upper limits of I/O and throughput are increased significantly. Testing has found
nconnect=8 to be the most performant.
When preparing a multi-node SAS GRID environment for production, you might notice a repeatable 30% reduction in run time going from 8 hours to 5.5 hours:
|Mount option||Job run times|
Both sets of tests used the same E32-8_v4 virtual machine and RHEL8.3, with read-ahead set to 15 MiB.
When you use
nconnect, keep the following rules in mind:
nconnectis supported by Azure NetApp Files on all major Linux distributions but only on newer releases:
Linux release NFSv3 (minimum release) NFSv4.1 (minimum release) Redhat Enterprise Linux RHEL8.3 RHEL8.3 SUSE SLES12SP4 or SLES15SP1 SLES15SP2 Ubuntu Ubuntu18.04
SLES15SP2 is the minimum SUSE release in which
nconnectis supported by Azure NetApp Files for NFSv4.1. All other releases as specified are the first releases that introduced the
All mounts from a single endpoint will inherit the
nconnectsetting of the first export mounted, as shown in the following scenarios:
nconnectis used by the first mount. Therefore, all mounts against the same endpoint use
mount 10.10.10.10:/volume1 /mnt/volume1 -o nconnect=8
mount 10.10.10.10:/volume2 /mnt/volume2
mount 10.10.10.10:/volume3 /mnt/volume3
nconnectis not used by the first mount. Therefore, no mounts against the same endpoint use
nconnectmay be specified thereon.
mount 10.10.10.10:/volume1 /mnt/volume1
mount 10.10.10.10:/volume2 /mnt/volume2 -o nconnect=8
mount 10.10.10.10:/volume3 /mnt/volume3 -o nconnect=8
nconnectsettings are not propagated across separate storage endpoints.
nconnectis used by the mount coming from
10.10.10.10but not by the mount coming from
mount 10.10.10.10:/volume1 /mnt/volume1 -o nconnect=8
mount 10.12.12.12:/volume2 /mnt/volume2
nconnectmay be used to increase storage concurrency from any given client.
For details, see Linux concurrency best practices for Azure NetApp Files.
It is not recommended to use
sec=krb5* mount options together. Performance degradation has been observed when using the two options in combination.
The Generic Security Standard Application Programming Interface (GSS-API) provides a way for applications to protect data sent to peer applications. This data might be sent from a client on one machine to a server on another machine.
nconnect is used in Linux, the GSS security context is shared between all the
nconnect connections to a particular server. TCP is a reliable transport that supports out-of-order packet delivery to deal with out-of-order packets in a GSS stream, using a sliding window of sequence numbers. When packets not in the sequence window are received, the security context is discarded, and a new security context is negotiated. All messages sent with in the now-discarded context are no longer valid, thus requiring the messages to be sent again. Larger number of packets in an
nconnect setup cause frequent out-of-window packets, triggering the described behavior. No specific degradation percentages can be stated with this behavior.
Examples in this section provide information about how to approach performance tuning. You might need to make adjustments to suit your specific application needs.
wsize flags set the maximum transfer size of an NFS operation. If
wsize are not specified on mount, the client and server negotiate the largest size supported by the two. Currently, both Azure NetApp Files and modern Linux distributions support read and write sizes as large as 1,048,576 Bytes (1 MiB). However, for best overall throughput and latency, Azure NetApp Files recommends setting both
wsize no larger than 262,144 Bytes (256 K). You might observe that both increased latency and decreased throughput when using
wsize larger than 256 KiB.
For example, Deploy a SAP HANA scale-out system with standby node on Azure VMs by using Azure NetApp Files on SUSE Linux Enterprise Server shows the 256-KiB
wsize maximum as follows:
sudo vi /etc/fstab # Add the following entries 10.23.1.5:/HN1-data-mnt00001 /hana/data/HN1/mnt00001 nfs rw,vers=4,minorversion=1,hard,timeo=600,rsize=262144,wsize=262144,noatime,lock,_netdev,sec=sys 0 0 10.23.1.6:/HN1-data-mnt00002 /hana/data/HN1/mnt00002 nfs rw,vers=4,minorversion=1,hard,timeo=600,rsize=262144,wsize=262144,noatime,lock,_netdev,sec=sys 0 0 10.23.1.4:/HN1-log-mnt00001 /hana/log/HN1/mnt00001 nfs rw,vers=4,minorversion=1,hard,timeo=600,rsize=262144,wsize=262144,noatime,lock,_netdev,sec=sys 0 0 10.23.1.6:/HN1-log-mnt00002 /hana/log/HN1/mnt00002 nfs rw,vers=4,minorversion=1,hard,timeo=600,rsize=262144,wsize=262144,noatime,lock,_netdev,sec=sys 0 0 10.23.1.4:/HN1-shared/shared /hana/shared nfs rw,vers=4,minorversion=1,hard,timeo=600,rsize=262144,wsize=262144,noatime,lock,_netdev,sec=sys 0 0
For example, SAS Viya recommends a 256-KiB read and write sizes, and SAS GRID limits the
r/wsize to 64 KiB while augmenting read performance with increased read-ahead for the NFS mounts. See NFS read-ahead best practices for Azure NetApp Files for details.
The following considerations apply to the use of
- Random I/O operation sizes are often smaller than the
wsizemount options. As such, in effect, they will not be constrained thereby.
- When using the filesystem cache, sequential I/O will occur at the size predicated by the
wsizemount options, unless the file size is smaller than
- Operations bypassing the filesystem cache, although still constrained by the
wsizemount options, will not necessarily issue as large as the maximum specified by
wsize. This consideration is important when you use workload generators that have the
As a best practice with Azure NetApp Files, for best overall throughput and latency, set
wsize no larger than 262,144 Bytes.
Close-to-open consistency and cache attribute timers
NFS uses a loose consistency model. The consistency is loose because the application does not have to go to shared storage and fetch data every time to use it, a scenario that would have a tremendous impact to application performance. There are two mechanisms that manage this process: cache attribute timers and close-to-open consistency.
If the client has complete ownership of data, that is, it is not shared between multiple nodes or systems, there is guaranteed consistency. In that case, you can reduce the
getattr access operations to storage and speed up the application by turning off close-to-open (
cto) consistency (
nocto as a mount option) and by turning up the timeouts for the attribute cache management (
actimeo=600 as a mount option changes the timer to 10m versus the defaults
acregmin=3,acregmax=30,acdirmin=30,acdirmax=60). In some testing,
nocto reduces approximately 65-70% of the
getattr access calls, and adjusting
actimeo reduces these calls another 20-25%.
How attribute cache timers work
acdirmax control the coherency of the cache. The former two attributes control how long the attributes of files are trusted. The latter two attributes control how long the attributes of the directory file itself are trusted (directory size, directory ownership, directory permissions). The
max attributes define minimum and maximum duration over which attributes of a directory, attributes of a file, and cache content of a file are deemed trustworthy, respectively. Between
max, an algorithm is used to define the amount of time over which a cached entry is trusted.
For example, consider the default
acregmax values, 3 and 30 seconds, respectively. For instance, the attributes are repeatedly evaluated for the files in a directory. After 3 seconds, the NFS service is queried for freshness. If the attributes are deemed valid, the client doubles the trusted time to 6 seconds, 12 seconds, 24 seconds, then as the maximum is set to 30, 30 seconds. From that point on, until the cached attributes are deemed out of date (at which point the cycle starts over), trustworthiness is defined as 30 seconds being the value specified by
There are other cases that can benefit from a similar set of mount options, even when there's no complete ownership by the clients, for example, if the clients use the data as read only and data update is managed through another path. For applications that use grids of clients like EDA, web hosting and movie rendering and have relatively static data sets (EDA tools or libraries, web content, texture data), the typical behavior is that the data set is largely cached on the clients. There are few reads and no writes. There will be many
getattr/access calls coming back to storage. These data sets are typically updated through another client mounting the file systems and periodically pushing content updates.
In these cases, there's a known lag in picking up new content and the application still works with potentially out-of-date data. In these cases,
actimeo can be used to control the period where out-of-data date can be managed. For example, in EDA tools and libraries,
actimeo=600 works well because this data is typically updated infrequently. For small web hosting where clients need to see their data updates timely as they're editing their sites,
actimeo=10 might be acceptable. For large-scale web sites where there's content pushed to multiple file systems,
actimeo=60 might be acceptable.
Using these mount options significantly reduces the workload to storage in these cases. (For example, a recent EDA experience reduced IOPs to the tool volume from >150 K to ~6 K.) Applications can run significantly faster because they can trust the data in memory. (Memory access time is nanoseconds vs. hundreds of microseconds for
getattr/access on a fast network.)
Close-to-open consistency (the
cto mount option) ensures that no matter the state of the cache, on open the most recent data for a file is always presented to the application.
- When a directory is crawled (
ls -lfor example) a certain set of PRC calls are issued.
The NFS server shares its view of the filesystem. As long as
ctois used by all NFS clients accessing a given NFS export, all clients will see the same list of files and directories therein. The freshness of the attributes of the files in the directory is controlled by the attribute cache timers. In other words, as long as
ctois used, files appear to remote clients as soon as the file is created and the file lands on the storage.
- When a file is opened, the content of the file is guaranteed fresh from the perspective of the NFS server.
If there's a race condition where the content has not finished flushing from Machine 1 when a file is opened on Machine 2, Machine 2 will only receive the data present on the server at the time of the open. In this case, Machine 2 will not retrieve more data from the file until the
acregtimer is reached, and Machine 2 checks its cache coherency from the server again. This scenario can be observed using a tail
-ffrom Machine 2 when the file is still being written to from Machine 1.
No close-to-open consistency
When no close-to-open consistency (
nocto) is used, the client will trust the freshness of its current view of the file and directory until the cache attribute timers have been breached.
When a directory is crawled (
ls -lfor example) a certain set of PRC calls are issued.
The client will only issue a call to the server for a current listing of files when the
acdircache timer value has been breached. In this case, recently created files and directories will not appear and recently removed files and directories will still appear.
When a file is opened, as long as the file is still in the cache, its cached content (if any) is returned without validating consistency with the NFS server.
Submit and view feedback for