Compare md5 metadata proeprty to md5.txt

Anonymous
2020-12-21T15:04:33.82+00:00

My task is to copy files from aws s3 to azure blob storage. The copying works fine, but I also need to check the md5 checksum after the transfer is finished. I have all the checksums with the corresponding file name in a .txt file and every transferred file has a metadata field with the checksum. I need to compare them and need to check if they are the same.

What's an easy way to accomplish this? Thanks!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,672 questions
{count} votes

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,156 Reputation points
    2020-12-21T22:41:34.04+00:00

    Hello @Anonymous and welcome to Microsoft Q&A.

    Depending upon how you are copying your data, there are data consistency options.

    If you are using Binary copy (as opposed to tablular / delimited text), then the checksum is performed for you. This is a much, much easier option than trying to check the MD5 afterward. A dataset of type binary can copy any type of file including text.

    50171-image.png

    If you want to verify the checksum after copy, using the text file you mentioned, the process is more involved.
    First a Lookup Activity is used to read the contents of your .txt file via delimited text dataset. Then a ForEach Activity is used to iterate over each row. Inside the ForEach loop, a Get Metadata Activity uses a parameterized dataset to get the MD5. The filename is passed into the Get Metadata Activity, and through it to the parameterized dataset where is stands in for filename.
    After the Get Metadata Activity, an If Activity compares the MD5 from the output, to the one in the current row/item.
    Inside the If Activity, an Append Variable activity is used to add the filename to an array type variable. This will be a list of all filenames that fail the MD5 comparison.

    Outside, after the whole ForEach activity completes, you will need to push the list of failed filenames to a destination of your choice. It could be writing to a blob, or sending to logic app for email, or function app for further processing, or something else.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.