Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Question
Friday, May 18, 2012 12:47 PM
Hello, I am new to this forum, but I was wondering if some people could give me some informational specifics on the difference between a stream and reading bytes. I ran into the following problem at work, and although two co-workers were able to argue the choice between them I was lost.
We were generating a hash for a checksum on Amazon cloud and the code involved this:
using (Stream s = response.ResponseStream)
{
byte[] bytes = new byte[response.ContentLength];
int bytesRead = 0;
while (bytesRead < bytes.Length)
{
bytesRead += s.Read(bytes, bytesRead, (int)bytesRead+3072);
}
hash.ComputeHash(bytes);
return IO.File.GenerateHash(hash);
}
Which I understand takes pieces at a time and then we calculate the checksum from the mashup of bytes, but this did not work because it used alot of memory.
The alternative solution became:
using (GetObjectResponse response = client.GetObject(request))
{
using (Stream s = response.ResponseStream)
{
hash.ComputeHash(s);
return IO.File.GenerateHash(hash);
}
}
Which obviously seems much simpiler, but I dont understand really what streams are, as in what they contain; how its structured; and what they are used for. I would greatly appreciate any information you all could give me in defining these two areas, as well as bonus pretzels for anyone who can explain how "using" in these cases work. All I have been told is that they 'help the system dispose of it' which I can understand, but my people seem to think its not even a choice to not use. Thank you!
All replies (2)
Friday, May 18, 2012 5:11 PM âś…Answered
A stream is really just a wrapper around bytes. The different streams offer you methods and properties that let you interact with the bytes specific to the types of bytes you are dealing with. For example a filestream knows that its internal bytes represent the contents of a file and thus offer you methods to save the file or read specific bytes. In a nutshell, a stream gives you a way to control a bunch of bytes without having to loop through arrays manually and provides you some base functionality for know sets of bytes.
The only difference between the two methods you have is in how they call the ComputeHash method. This method can take both a byte array or a stream. The first version of your method is really duplicating the data returned from Amazon since it is all contained as bytes inside the stream. This may be why you ran into memory issues.
Bonus answer: The using statement is a mechanism that frees you from having to worry about disposing of objects. So instead of writing a try catch finally block where you dispose of all in-memory items, the using does this for you at compile time. For the using block to work, the object must implement the IDisposable interface. It is good practice to use since it eliminates the need to cleanup objects after use.
Saturday, May 19, 2012 7:39 PM
Your first piece of code reads all the bytes from Amazon into a buffer in memory called bytes. If the number of bytes coming from Amazon is very large then you will use a very large piece of memory to hold them all. If it is bigger than your process is allowed to have then you will get an error (as you have seen).
The second piece of code is probably reading a few bytes at a time, computing what it needs, reading some more into the same buffer, doing some more computing, reading some more into the same buffer, doing some more computing, reading some more into the same buffer, ... until all bytes have been read and the final checksum has been calculated.
You could verify this by digging into the code for ComputeHash(Stream) but it seems pretty obvious that the second method is being more economical with its use of memory than your 'manual' method is. And, frankly, this is the basic idea behind a stream. Sure you can manipulate them in other ways but the basic approach should be to read a small amount of information, do what is required, rinse, repeat. This style means that you can handle enormous streams of information (in fact a never ending stream if that is what you want).