Azure Files failing transactions

Hauck, Michael 1 Reputation point
2022-05-25T17:18:11.403+00:00

We have recently configured Azure Files and successfully migrated one of our many department directories.
We are experiencing no significant issues and no user complaints to this point.

As part of the configuration we enabled Diagnostic Settings (File).
If we go to Azure Monitor > Storage accounts > select our Azure Files storage account;
We are seeing 1.7K "ClientOtherError/Errors" over the past 4 hours

!205583-image.png

Drilling into the log error details we see similar to this;
205572-image.png

The errors indicate problems for multiple operation types and for many of our clients who access this share.
Basically, my question is how do we look into exactly what is causing these errors?

Azure Files
Azure Files
An Azure service that offers file shares in the cloud.
1,207 questions
Azure Storage Explorer
Azure Storage Explorer
An Azure tool that is used to manage cloud storage resources on Windows, macOS, and Linux.
238 questions
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
2,849 questions
Azure Disk Encryption
Azure Disk Encryption
An Azure service for virtual machines (VMs) that helps address organizational security and compliance requirements by encrypting the VM boot and data disks with keys and policies that are controlled in Azure Key Vault.
162 questions
Azure Disk Storage
Azure Disk Storage
A high-performance, durable block storage designed to be used with Azure Virtual Machines and Azure VMware Solution.
588 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Sumarigo-MSFT 44,806 Reputation points Microsoft Employee
    2022-05-25T17:58:37.043+00:00

    @Hauck, Michael Welcome to Microsoft Q&A Forum, Thank your for posting your query here!

    ClientOtherError: All other client-side errors except described ones. For more information refer to this article: https://learn.microsoft.com/en-us/azure/storage/common/storage-insights-overview

    The status of the requested operation. For a complete list of status messages, see Storage Analytics Logged Operations and Status Messages topic. In version 2017-04-17 and later, the status message ClientOtherError isn't used. Instead, this field contains an error code. For example: SASSuccess

    For details on which errors can be shown in the report, see Response Type schema and look for response types such as ServerOtherError, ClientOtherError, ClientThrottlingError. Depending on the storage accounts selected, if there are more than three types of errors reported, all other errors are represented under the category of Other.

    This type of error occurs when the client does too many requests against the same partition server. When such happens and the partition server gets overloaded, it does internal load balancing operations as part of the normal azure storage healing process.

    When the partition being accessed suffers a load balancing operation (reassigning partitions to less loaded servers), the storage service returns 500 or 503 errors.

    The limits I previously mentioned (the 800 reads for 5 minutes) are indeed for management operations and not for data ones. In your case, the GetBlob ones are data operations and are not covered by these hard limits. After analyzing the ingress/egress limit and also the transactions per second of your storage account, I verified that you also seem to be far away from hitting the threshold.

    Just for the record and improved searchability: In Metrics these errors showed up as ClientOtherError and ClientThrottlingError.

    Additional information : Metrics show low PercentSuccess or analytics log entries have operations with transaction status of ClientOtherErrors

    The PercentSuccess metric captures the percent of operations that were successful based on their HTTP Status Code. Operations with status codes of 2XX count as successful, whereas operations with status codes in 3XX, 4XX and 5XX ranges are counted as unsuccessful and lower the PercentSuccess metric value. In the server-side storage log files, these operations are recorded with a transaction status of ClientOtherErrors.

    It is important to note that these operations have completed successfully and therefore do not affect other metrics such as availability. Some examples of operations that execute successfully but that can result in unsuccessful HTTP status codes include:

    • ResourceNotFound (Not Found 404), for example from a GET request to a blob that does not exist.
    • ResourceAlreadyExists (Conflict 409), for example from a CreateIfNotExist operation where the resource already exists.
    • ConditionNotMet (Not Modified 304), for example from a conditional operation such as when a client sends an ETag value and an HTTP If-None-Match header to request an image only if it has been updated since the last operation.

    Please let us know if you have any further queries. I’m happy to assist you further.

    ----------

    Please do not forget to 205519-screenshot-2021-12-10-121802.png and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.


  2. Sudipta Chakraborty - MSFT 1,101 Reputation points Microsoft Employee
    2022-05-25T18:10:01.907+00:00

    @Hauck, Michael :

    To get all the information that are logged you can open the logs in Log Analytics Workspace as shown in the diagram below and run the KQL query.

    205557-image.png

    KQL Query:

    StorageFileLogs  
    | where Protocol == "SMB" and TimeGenerated >= ago(7d) and StatusCode contains "-"  
    | sort by StatusCode  
    

    Reference:
    https://learn.microsoft.com/en-us/azure/storage/files/storage-files-monitoring?tabs=azure-portal#sample-kusto-queries
    https://learn.microsoft.com/en-us/azure/storage/files/storage-files-monitoring-reference#fields-that-describe-the-service