Редактиране

Споделяне чрез


Troubleshoot Azure Files performance issues

Note

CentOS referenced in this article is a Linux distribution and will reach End Of Life (EOL). Consider your use and plan accordingly. For more information, see CentOS End Of Life guidance.

This article lists common problems related to Azure file share performance, and provides potential causes and workarounds. To get the most value from this troubleshooting guide, we recommend first reading Understand Azure Files performance.

Applies to

File share type SMB NFS
Standard file shares (GPv2), LRS/ZRS
Standard file shares (GPv2), GRS/GZRS
Premium file shares (FileStorage), LRS/ZRS

General performance troubleshooting

First, rule out some common reasons why you might be having performance problems.

You're running an old operating system

If your client virtual machine (VM) is running Windows 8.1 or Windows Server 2012 R2, or an older Linux distro or kernel, you might experience performance issues when accessing Azure file shares. Either upgrade your client OS or apply the fixes below.

Considerations for Windows 8.1 and Windows Server 2012 R2

Clients that are running Windows 8.1 or Windows Server 2012 R2 might see higher than expected latency when accessing Azure file shares for I/O-intensive workloads. Make sure that the KB3114025 hotfix is installed. This hotfix improves the performance of create and close handles.

You can run the following script to check whether the hotfix has been installed:

reg query HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters\Policies

If the hotfix is installed, the following output is displayed:

HKEY_Local_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters\Policies {96c345ef-3cac-477b-8fcd-bea1a564241c} REG_DWORD 0x1

Note

Windows Server 2012 R2 images in Azure Marketplace have hotfix KB3114025 installed by default, starting in December 2015.

Your workload is being throttled

Requests are throttled when the I/O operations per second (IOPS), ingress, or egress limits for a file share are reached. For example, if the client exceeds baseline IOPS, it will get throttled by the Azure Files service. Throttling can result in the client experiencing poor performance.

To understand the limits for standard and premium file shares, see File share and file scale targets. Depending on your workload, throttling can often be avoided by moving from standard to premium Azure file shares.

To learn more about how throttling at the share level or storage account level can cause high latency, low throughput, and general performance issues, see Share or storage account is being throttled.

High latency, low throughput, or low IOPS

Cause 1: Share or storage account is being throttled

To confirm whether your share or storage account is being throttled, you can access and use Azure metrics in the portal. You can also create alerts that will notify you if a share is being throttled or is about to be throttled. See Troubleshoot Azure Files by creating alerts.

Important

For standard storage accounts, throttling occurs at the storage account level. For premium file shares, throttling occurs at the share level.

  1. In the Azure portal, go to your storage account.

  2. On the left pane, under Monitoring, select Metrics.

  3. Select File as the metric namespace for your storage account scope.

  4. Select Transactions as the metric.

  5. Add a filter for Response type, and then check to see whether any requests have been throttled.

    For standard file shares, the following response types are logged if a request is throttled at the client account level:

    • ClientAccountRequestThrottlingError
    • ClientAccountBandwidthThrottlingError

    For premium file shares, the following response types are logged if a request is throttled at the share level:

    • SuccessWithShareEgressThrottling
    • SuccessWithShareIngressThrottling
    • SuccessWithShareIopsThrottling
    • ClientShareEgressThrottlingError
    • ClientShareIngressThrottlingError
    • ClientShareIopsThrottlingError

    If a throttled request was authenticated with Kerberos, you might see a prefix indicating the authentication protocol, such as:

    • KerberosSuccessWithShareEgressThrottling
    • KerberosSuccessWithShareIngressThrottling

    To learn more about each response type, see Metric dimensions.

    Screenshot that shows the 'Response type' property filter.

Solution

If you're using a premium file share, increase the provisioned file share size to increase the IOPS limit. To learn more, see the Understanding provisioning for premium file shares.

Cause 2: Metadata or namespace heavy workload

If the majority of your requests are metadata-centric (such as createfile, openfile, closefile, queryinfo, or querydirectory), the latency will be worse than that of read/write operations.

To determine whether most of your requests are metadata-centric, start by following steps 1-4 as previously outlined in Cause 1. For step 5, instead of adding a filter for Response type, add a property filter for API name.

Screenshot that shows the 'API name' property filter.

Workarounds

  • Check to see whether the application can be modified to reduce the number of metadata operations.

  • If you're using premium SMB Azure file shares, use metadata caching.

  • Separate the file share into multiple file shares within the same storage account.

  • Add a virtual hard disk (VHD) on the file share and mount the VHD from the client to perform file operations against the data. This approach works for single writer/reader scenarios or scenarios with multiple readers and no writers. Because the file system is owned by the client rather than Azure Files, this allows metadata operations to be local. The setup offers performance similar to that of local directly attached storage. However, because the data is in a VHD, it can't be accessed via any other means other than the SMB mount, such as REST API or through the Azure portal.

    1. From the machine which needs to access the Azure file share, mount the file share using the storage account key and map it to an available network drive (for example, Z:).
    2. Go to Disk Management and select Action > Create VHD.
    3. Set Location to the network drive that the Azure file share is mapped to, set Virtual hard disk size as needed, and select Fixed size.
    4. Select OK. Once the VHD creation is complete, it will automatically mount, and a new unallocated disk will appear.
    5. Right-click the new unknown disk and select Initialize Disk.
    6. Right-click the unallocated area and create a New Simple Volume.
    7. You should see a new drive letter appear in Disk Management representing this VHD with read/write access (for example, E:). In File Explorer, you should see the new VHD on the mapped Azure file share's network drive (Z: in this example). To be clear, there should be two drive letters present: the standard Azure file share network mapping on Z:, and the VHD mapping on the E: drive.
    8. There should be much better performance on heavy metadata operations against files on the VHD mapped drive (E:) versus the Azure file share mapped drive (Z:). If desired, it should be possible to disconnect the mapped network drive (Z:) and still access the mounted VHD drive (E:).
    • To mount a VHD on a Windows client, you can also use the Mount-DiskImage PowerShell cmdlet.
    • To mount a VHD on Linux, consult the documentation for your Linux distribution. Here's an example.

Cause 3: Single-threaded application

If the application that you're using is single-threaded, this setup can result in significantly lower IOPS throughput than the maximum possible throughput, depending on your provisioned share size.

Solution

  • Increase application parallelism by increasing the number of threads.
  • Switch to applications where parallelism is possible. For example, for copy operations, you could use AzCopy or RoboCopy from Windows clients or the parallel command from Linux clients.

Cause 4: Number of SMB channels exceeds four

If you're using SMB MultiChannel and the number of channels you have exceeds four, this will result in poor performance. To determine if your connection count exceeds four, use the PowerShell cmdlet get-SmbClientConfiguration to view the current connection count settings.

Solution

Set the Windows per NIC setting for SMB so that the total channels don't exceed four. For example, if you have two NICs, you can set the maximum per NIC to two using the following PowerShell cmdlet: Set-SmbClientConfiguration -ConnectionCountPerRssNetworkInterface 2.

Cause 5: Read-ahead size is too small (NFS only)

Beginning with Linux kernel version 5.4, the Linux NFS client uses a default read_ahead_kb value of 128 kibibytes (KiB). This small value might reduce the amount of read throughput for large files.

Solution

We recommend that you increase the read_ahead_kb kernel parameter value to 15 mebibytes (MiB). To change this value, set the read-ahead size persistently by adding a rule in udev, a Linux kernel device manager. Follow these steps:

  1. In a text editor, create the /etc/udev/rules.d/99-nfs.rules file by entering and saving the following text:

    SUBSYSTEM=="bdi" \
    , ACTION=="add" \
    , PROGRAM="/usr/bin/awk -v bdi=$kernel 'BEGIN{ret=1} {if ($4 == bdi) {ret=0}} END{exit ret}' /proc/fs/nfsfs/volumes" \
    , ATTR{read_ahead_kb}="15360"
    
  2. In a console, apply the udev rule by running the udevadm command as a superuser and reloading the rules files and other databases. To make udev aware of the new file, you only need to run this command once.

    sudo udevadm control --reload
    

Very high latency for requests

Cause

The client VM could be located in a different region than the file share. Other reason for high latency could be due to the latency caused by the client or the network.

Solution

  • Run the application from a VM that's located in the same region as the file share.
  • For your storage account, review transaction metrics SuccessE2ELatency and SuccessServerLatency via Azure Monitor in Azure portal. A high difference between SuccessE2ELatency and SuccessServerLatency metrics values is an indication of latency that is likely caused by the network or the client. See Transaction metrics in Azure Files Monitoring data reference.

Client unable to achieve maximum throughput supported by the network

Cause

One potential cause is a lack of SMB multi-channel support for standard file shares. Currently, Azure Files supports only single channel for standard file shares, so there's only one connection from the client VM to the server. This single connection is pegged to a single core on the client VM, so the maximum throughput achievable from a VM is bound by a single core.

Workaround

Slow performance on an Azure file share mounted on a Linux VM

Cause 1: Caching

One possible cause of slow performance is disabled caching. Caching can be useful if you are accessing a file repeatedly. Otherwise, it can be an overhead. Check if you're using the cache before disabling it.

Solution for cause 1

To check whether caching is disabled, look for the cache= entry.

Cache=none indicates that caching is disabled. Remount the share by using the default mount command or by explicitly adding the cache=strict option to the mount command to ensure that default caching or "strict" caching mode is enabled.

In some scenarios, the serverino mount option can cause the ls command to run stat against every directory entry. This behavior results in performance degradation when you're listing a large directory. You can check the mount options in your /etc/fstab entry:

//azureuser.file.core.windows.net/cifs /cifs cifs vers=2.1,serverino,username=xxx,password=xxx,dir_mode=0777,file_mode=0777

You can also check whether the correct options are being used by running the sudo mount | grep cifs command and checking its output. The following is an example output:

//azureuser.file.core.windows.net/cifs on /cifs type cifs (rw,relatime,vers=2.1,sec=ntlmssp,cache=strict,username=xxx,domain=X,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.10.1,file_mode=0777, dir_mode=0777,persistenthandles,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,actimeo=1)

If the cache=strict or serverino option isn't present, unmount and mount Azure Files again by running the mount command from the documentation. Then, recheck that the /etc/fstab entry has the correct options.

Cause 2: Throttling

It's possible you're experiencing throttling, and your requests are being sent to a queue. You can verify this by leveraging Azure Storage metrics in Azure Monitor. You can also create alerts that notify you if a share is being throttled or is about to be throttled. See Troubleshoot Azure Files by creating alerts.

Solution for cause 2

Ensure your app is within the Azure Files scale targets. If you're using standard Azure file shares, consider switching to premium.

Throughput on Linux clients is lower than that of Windows clients

Cause

This is a known issue with implementing the SMB client on Linux.

Workaround

  • Spread the load across multiple VMs.
  • On the same VM, use multiple mount points with a nosharesock option, and spread the load across these mount points.
  • On Linux, try mounting with a nostrictsync option to avoid forcing an SMB flush on every fsync call. For Azure Files, this option doesn't interfere with data consistency, but it might result in stale file metadata on directory listings (ls -l command). Directly querying file metadata by using the stat command will return the most up-to-date file metadata.

High latencies for metadata-heavy workloads involving extensive open/close operations

Cause

Lack of support for directory leases.

Workaround

  • If possible, avoid using an excessive opening/closing handle on the same directory within a short period of time.
  • For Linux VMs, increase the directory entry cache timeout by specifying actimeo=<sec> as a mount option. By default, the timeout is 1 second, so a larger value, such as 30 seconds, might help.
  • For CentOS Linux or Red Hat Enterprise Linux (RHEL) VMs, upgrade the system to CentOS Linux 8.2 or RHEL 8.2. For other Linux distros, upgrade the kernel to 5.0 or later.

Slow enumeration of files and folders

Cause

This problem can occur if there isn't enough cache on the client machine for large directories.

Solution

To resolve this problem, adjust the DirectoryCacheEntrySizeMax registry value to allow caching of larger directory listings in the client machine:

  • Location: HKEY_LOCAL_MACHINE\System\CCS\Services\Lanmanworkstation\Parameters
  • Value name: DirectoryCacheEntrySizeMax
  • Value type: DWORD

For example, you can set it to 0x100000 and see if performance improves.

Slow file copying to and from Azure file shares

You might see slow performance when you try to transfer files to the Azure Files service. If you don't have a specific minimum I/O size requirement, we recommend that you use 1 MiB as the I/O size for optimal performance.

Slow file copying to and from Azure Files in Windows

  • If you know the final size of a file that you're extending with writes, and your software doesn't have compatibility problems when the unwritten tail on the file contains zeros, then set the file size in advance instead of making every write an extending write.

  • Use the right copy method:

    • Use AzCopy for any transfer between two file shares.
    • Use Robocopy between file shares on an on-premises computer.

Excessive DirectoryOpen/DirectoryClose calls

Cause

If the number of DirectoryOpen/DirectoryClose calls is among the top API calls and you don't expect the client to make that many calls, the issue might be caused by the antivirus software that's installed on the Azure client VM.

Workaround

A fix for this issue is available in the April Platform Update for Windows.

SMB Multichannel isn't being triggered

Cause

Recent changes to SMB Multichannel config settings without a remount.

Solution

  • After any changes to Windows SMB client or account SMB multichannel configuration settings, you have to unmount the share, wait for 60 seconds, and remount the share to trigger the multichannel.
  • For Windows client OS, generate IO load with high queue depth say QD=8, for example copying a file to trigger SMB Multichannel. For server OS, SMB Multichannel is triggered with QD=1, which means as soon as you start any IO to the share.

Slow performance when unzipping files

Depending on the exact compression method and unzip operation used, decompression operations may perform more slowly on an Azure file share than on your local disk. This is often because unzipping tools perform a number of metadata operations in the process of performing the decompression of a compressed archive. For the best performance, we recommend copying the compressed archive from the Azure file share to your local disk, unzipping there, and then using a copy tool such as Robocopy (or AzCopy) to copy back to the Azure file share. Using a copy tool like Robocopy can compensate for the decreased performance of metadata operations in Azure Files relative to your local disk by using multiple threads to copy data in parallel.

High latency on web sites hosted on file shares

Cause

High number file change notification on file shares can result in high latencies. This typically occurs with web sites hosted on file shares with deep nested directory structure. A typical scenario is IIS hosted web application where file change notification is set up for each directory in the default configuration. Each change (ReadDirectoryChangesW) on the share that the client is registered for pushes a change notification from the file service to the client, which takes system resources, and the issue worsens with the number of changes. This can cause share throttling and thus result in higher client-side latency.

To confirm, you can use Azure Metrics in the portal.

  1. In the Azure portal, go to your storage account.
  2. In the left menu, under Monitoring, select Metrics.
  3. Select File as the metric namespace for your storage account scope.
  4. Select Transactions as the metric.
  5. Add a filter for ResponseType and check to see if any requests have a response code of SuccessWithThrottling (for SMB or NFS) or ClientThrottlingError (for REST).

Solution

  • If file change notification isn't used, disable file change notification (preferred).

  • Increase the frequency of the file change notification polling interval to reduce volume.

    Update the W3WP worker process polling interval to a higher value (for example, 10 or 30 minutes) based on your requirement. Set HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\W3SVC\Parameters\ConfigPollMilliSeconds in your registry and restart the W3WP process.

  • If your web site's mapped physical directory has a nested directory structure, you can try to limit the scope of file change notifications to reduce the notification volume. By default, IIS uses configuration from Web.config files in the physical directory to which the virtual directory is mapped, as well as in any child directories in that physical directory. If you don't want to use Web.config files in child directories, specify false for the allowSubDirConfig attribute on the virtual directory. More details can be found here.

    Set the IIS virtual directory allowSubDirConfig setting in Web.Config to false to exclude mapped physical child directories from the scope.

See also

Contact us for help

If you have questions or need help, create a support request, or ask Azure community support. You can also submit product feedback to Azure feedback community.