azcopy --put-md5 ; calculated where

mainuser 6 Reputation points
2022-06-04T22:57:25.12+00:00

Hello,

When using the --put-md5 switch in azcopy, is the md5 checksum value calculated at the source (e.g. my computer) or the destination (Azure)? I want to be able to use the checksum to verify the file on Azure, but if the checksum is calculated on the source file (as opposed to on Azure, after the file is finished uploading), the checksum that appears in the Content-MD5 field will not be useful.

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,449 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. Eric Levinson 5 Reputation points
    2023-06-15T22:07:41.5733333+00:00

    There's no MS solution to this. MD5 calculations on FTP servers has been around for 35 years - not sure why MS decided its not necessary. This is so unproductive. The server needs to calculate the MD5 hash independently of the client. Calculating the MD5 on the client and then populating the MD5 on the server file doesn't help anyone. If a bit or byte gets incorrectly copied - the MD5 will always match the client, and you don't know the file is corrupted until you try to use it. Why not allow us to ensure the file is uploaded correctly?

    I've been spending months explaining all of this to Microsoft support and they don't get it. I don't understand, why not add two lines of code and an additional nanosecond of CPU power to calculate all MD5's on all BLOB files in the filesystem? (Oh wait the OS already does this automatically) why not copy it to the properties?

    1 person found this answer helpful.
    0 comments No comments

  2. Sreeju Nair 11,621 Reputation points
    2022-06-05T06:03:32.223+00:00

    The azcopy command create the md5 for the file content and store it in the Content-MD5 property.

    Description about --put-md5 option : Create an MD5 hash of each file, and save the hash as the Content-MD5 property of the destination blob or file. (By default the hash is NOT created.) Only available when uploading.

    Refer: https://learn.microsoft.com/en-us/azure/storage/common/storage-ref-azcopy-copy

    Since the md5 is calculated with the file content, it really not matters where it is calculated as you will get the same output always. Based on my understanding put-md5 option will calculate the md5 in the client machine, thats why when you upload files from Amazon S3 to Azure, the --put-md5 option is not available as in that case file is not getting downloaded to your computer.

    Refer: https://learn.microsoft.com/en-us/answers/questions/732303/azcopy-checksum-verification-when-both-source-and.html

    If you are facing problems to verify the checksum, you could refer the following post.

    https://galdin.dev/blog/md5-has-checks-on-azure-blob-storage-files/

    Hope this helps


  3. Sumarigo-MSFT 43,806 Reputation points Microsoft Employee
    2022-06-07T06:20:20.203+00:00

    @mainuser Adding more information to the above response!

    put-MD5 check calculates MD5 on the source not destination. It puts the value in HTTP MD5 header. When you download, AzCopy can recalculate the MD5 and verify that it is equal to the one you uploaded.

    Data integrity and validation: https://github.com/Azure/azure-storage-azcopy/wiki/Data-integrity-and-validation

    az copy --put-md5. MD5 hash is calculated and stored automatically

    There are no ways/mechanism to test this in case of S2S transfer.

    Please let us know if you have any further queries. I’m happy to assist you further.

    ----------

    Please do not forget to 208926-screenshot-2021-12-10-121802.png and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.