Got confused about HashAlgorithm.TransformBlock

杨岑 121 Reputation points
2024-04-01T16:06:35.1466667+00:00

According to the official doc https://learn.microsoft.com/en-us/dotnet/api/system.security.cryptography.hashalgorithm.transformblock?view=net-8.0, if I provide outputBuffer, it has to the same as the inputBuffer:

Calling the TransformBlock method with different input and output arrays results in an IOException.

However, the source code here (https://github.com/dotnet/runtime/blob/main/src/libraries/System.Security.Cryptography/src/System/Security/Cryptography/HashAlgorithm.cs) is quite the opposite:

public int TransformBlock(byte[] inputBuffer, int inputOffset, int inputCount, byte[]? outputBuffer, int outputOffset)
        {
            ValidateTransformBlock(inputBuffer, inputOffset, inputCount);

            // Change the State value
            State = 1;

            HashCore(inputBuffer, inputOffset, inputCount);
            if ((outputBuffer != null) && ((inputBuffer != outputBuffer) || (inputOffset != outputOffset)))
            {
                // We let BlockCopy do the destination array validation
                Buffer.BlockCopy(inputBuffer, inputOffset, outputBuffer, outputOffset, inputCount);
            }
            return inputCount;
        }

Apparently, the source code expects inputBuffer != outputBuffer.

My test shows the docs are wrong:

var file = args.Length > 1 ? args[1] : @"c:\windows\explorer.exe";

using (var fs = new FileStream(file, FileMode.Open, FileAccess.Read))
{
    var md5 = MD5.Create();

    var buf = new byte[100];
    var buf2 = new byte[100];
    var n = fs.Read(buf, 0, buf.Length);
    WriteLine(n);
    File.WriteAllBytes(@"c:\temp\test.bin", buf);

    md5.TransformBlock(buf, 0, n, buf2, 0);
    var hash = md5.TransformFinalBlock(buf, 0, 0);
    WriteLine(md5.Hash.ToHexString());
}

And another puzzle:

You must call the TransformBlock method before calling the TransformFinalBlock method. You must call both methods before you retrieve the final hash value.

The docs are wrong again. The following code yields same hash as the code above:

//md5.TransformBlock(buf, 0, n, buf2, 0);
var hash = md5.TransformFinalBlock(buf, 0, n);

And, what confuses me more is that what's the purpose of BlockCopy?

.NET
.NET
Microsoft Technologies based on the .NET software framework.
3,384 questions
.NET Runtime
.NET Runtime
.NET: Microsoft Technologies based on the .NET software framework.Runtime: An environment required to run apps that aren't compiled to machine language.
1,121 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Bruce (SqlWork.com) 56,286 Reputation points
    2024-04-01T21:41:04.3333333+00:00

    the use case of TransformBlock is to do a unbuffered copy and calculate the hash. say copy a file via buffers rather than steams. say you want to copy a large file with a 512 buffer. each call to TransformBlock() uses the running hash. The TransformFinalBlock() calculates the final hash, and resets the running hash value. it also returns the final byte array for output (if there is any partial buffer left).

    The source code expects the input and output buffers to be different, but using the same buffer works as BufferBlockCopy (which does the actual copy of input to output) supports overlapping buffers. Not sure why the sample did not pass null for output, unless null is new feature.

    0 comments No comments