The Solution To The Simple Puzzle
The first time I ran my histogram visualizer I asked for a Cauchy distribution with a minimum of -10 and a maximum of 10, and of course I got a graph that looks much like the one from my article of last week:
Looks perfectly reasonable; I guess my program is correct right out of the gate, because I am that awesome!
Then I went to make a graph of a uniform distribution with a minimum of zero and a maximum of one, but I forgot to update the actual query; it still gave me a Cauchy distribution. Here's that same Cauchy distribution this time graphed only from 0 to 1. Oh, the pain:
Which is obviously neither uniform nor Cauchy. Equally obvious: I am not sufficiently awesome to write a twenty-line program without a trivial floating point bug the first time.
The bug, which is very subtle in the first graph, was now obvious: the calculation to determine what the count is for the leftmost bucket is wrong. Why? Because converting a double to an integer simply discards the fractional part, effectively truncating towards zero, and "towards zero" is not downwards if any datum is negative. That means that the leftmost bucket got everything that was supposed to be in it, and everything that was supposed to be in the bucket to its left as well! The solution is either to take the floor of the number before turning it into an int, or to check to see if the double is in the right range before truncating it, not after.
Comments
Anonymous
February 26, 2012
Equally obvious: comments like these are what make this blog awesome.Anonymous
February 27, 2012
If you aren't sufficiently awesome, then what hope do the rest of us have? I guess that means we will have to test our software (only when working with graphs or floating point numbers, of course).Anonymous
February 27, 2012
Now it's suddenly becomes obvious why remainder is not a useful operation and why division should round downwards (re blogs.msdn.com/.../what-s-the-difference-remainder-vs-modulus.aspx)Anonymous
February 27, 2012
I assure you that my sample programs have even worse stupid errors. Your genius is that it occurred to you to make a blog entry out of them.Anonymous
February 27, 2012
This bug is exactly the reason that the behavior you describe in blogs.msdn.com/.../what-s-the-difference-remainder-vs-modulus.aspx is a design flaw: The misdesigned semantics encourage these kinds of mistakes.Anonymous
February 28, 2012
@Eamon I agree. I can't think of a single case where I'd want the modulus to be negative, but there are lots of situations where the converse (always positive) is an useful attribute.Anonymous
February 28, 2012
One more puzzle involving random numbers: var r=new Random(); const int n=1000000; Console.WriteLine(Enumerable.Range(0,n).Count(_=>r.Next(1431655765)%2==0)/(double)n); What does this output when you run it on .net 4?Anonymous
February 28, 2012
A common truncation error. This is on par with overloading the '==' operator in C# and forgetting to check for null left hand and right hand operands. MSDN help topics have serveral examples lacking null checks.Anonymous
October 07, 2012
This logic works fine for me : - private static int[] CreateHistogram(IEnumerable<double> data, int buckets, double min, double max) { int[] results = new int[buckets]; double divisor = (max - min) / buckets; foreach (double datum in data) { int index = (int)((datum - min) / divisor); if (0 <= index && index < buckets) results[index] += 1; } return results; }