AzCopy logs - Does MD5 mismatch come under DOWNLOADFAILED status?

Zeno GH 40 Reputation points
2024-05-21T10:10:37.93+00:00

Hi,

I'm using Azcopy(10.17.0) to upload/download files from storage account. While doing so --put-md5 and --check-md5 is used to check the data integrity of the file. However, if there is a mismatch in the hash, I want to re-try the download.

I plan to read Azcopy.log file fetch all the failed downloads + the ones with MD5 mismatch.

https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-configure#log-and-plan-files

Log file stores the statues - (UPLOADFAILED, COPYFAILED, and DOWNLOADFAILED) so I want to know if the MD5 mismatch will go under DOWNLOADFAILED category and how it will look?

e.g - What if the file is is downloaded on the fileshare but few rows are missing, will that still go under DOWNLOADFAILED category in the logs? So I can re-run it?

Regards

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,540 questions
0 comments No comments
{count} votes

Accepted answer
  1. Nehruji R 3,651 Reputation points Microsoft Vendor
    2024-05-22T05:56:11.33+00:00

    Hello Zeno GH,

    Greetings! Welcome to Microsoft Q&A Platform.

    AzCopyV10 supports MD5 hashes to validate the integrity of file contents. To opt in to this mechanism, include --put-md5 on the command line when uploading to Azure. NOTE that the actual check does not happen until the uploaded blob is used (i.e. downloaded) by AzCopy or another MD5-aware tool.

    The overall process looks like this:

    1. At upload time, the hash of the original disk file is computed, and recorded against the blob. I.e. hash of source file is stored against blob.
    2. At download time, when the file is written to disk, a new hash is computed. This new "download time" hash is compared to the original hash from the time of upload. If they match, that proves that the downloaded file, as written to disk, exactly matches the original file as read at the time of upload. By default, AzCopy will signal a failure if they don't match. This behavior can be configured by the --check-md5 flag. The default is to check hashes for all blobs that have them (i.e. all blobs which uploaded with AzCopy's --put-md5 or with another tool that uploads MD5s). refer - https://github.com/Azure/azure-storage-azcopy/wiki/Data-integrity-and-validation for detailed information.

    In the context of AzCopy, when an MD5 mismatch occurs during file download, it is indeed logged in the azcopy.log file. The specific log entries for MD5 mismatches will indicate that the calculated MD5 hash does not match the expected value. However, the exact format of these log entries can vary based on the version of AzCopy being used.

    If a download fails due to an MD5 mismatch, the log will indicate the specific file or blob that encountered the issue.

    AzCopy creates log and plan files for every job. You can use the logs to investigate and troubleshoot any potential problems. The logs will contain the status of failure (UPLOADFAILED, COPYFAILED, and DOWNLOADFAILED), the full path, and the reason of the failure. By default, the log and plan files are located in the %USERPROFILE.azcopy directory on Windows or $HOME.azcopy directory on Mac and Linux.

    Regarding your second question, if a file is downloaded to the file share but has missing rows, it would still be considered a successful download (not a DOWNLOADFAILED). The MD5 mismatch specifically refers to the integrity of the file’s content, not its completeness. If you encounter missing rows or other issues with the downloaded file, those would need to be handled separately from the MD5 validation.

    Hope this answer helps! Please let us know if you have any further queries. I’m happy to assist you further.


    Please "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.


0 additional answers

Sort by: Most helpful