Performance Quiz #12 -- The Cost of a Good Hash
This quiz is a little bit different than some of my others. Here is System.String.GetHashCode
public override unsafe int GetHashCode()
{
fixed (char* text1 = ((char*) this))
{
char* chPtr1 = text1;
int num1 = 0x15051505;
int num2 = num1;
int* numPtr1 = (int*) chPtr1;
for (int num3 = this.Length; num3 > 0; num3 -= 4)
{
num1 = (((num1 << 5) + num1) + (num1 >> 0x1b)) ^ numPtr1[0];
if (num3 <= 2)
{
break;
}
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr1[1];
numPtr1 += 2;
}
return (num1 + (num2 * 0x5d588b65));
}
}
It's a fine little hash function that's does pretty much the sorts of things you'd expect a hash function to do.
So now the quiz. It's a bit of trivia but I'm going somewhere with it.
Can you tell me 5 implementations of GetHashCode in the framework that do things that you wouldn't expect a hashing function to do?
Comments
Anonymous
January 22, 2007
The comment has been removedAnonymous
January 22, 2007
According to http://wesnerm.blogs.com/net_undocumented/2004/09/enums_and_perfo.html these will use reflection in GetHashCode, giving terrible performance.Anonymous
January 22, 2007
The comment has been removedAnonymous
January 22, 2007
Expanding on Peter Ritchie's point above, even those that do override GetHashCode don't always get it right after factoring in mutability and Equals overrides: http://weblogs.asp.net/bleroy/archive/2004/12/15/316601.aspx See for instance System.Web.UI.WebControls.ListItem. In terms of other unexpected behavior, I was surprised by System.Uri's GetHashCode method, which goes to a lot of trouble to establish the completeness of the URI infoset and to distinguish remote URLs for non-absolute URIs.Anonymous
January 22, 2007
Rico, Somewhat unrelated: I see a lot of classes that implement GetHashCode from constituent members in a pretty clumsy way. Why don't you guys provide a CombineHash function in order to create a pit of success here? That way somebody implementing GetHashCode for, say, struct Point, would simply do: return CombineHash (X.GetHashCode (), Y.GetHashCode ()); Thanks, DejanAnonymous
January 23, 2007
The comment has been removedAnonymous
January 23, 2007
The comment has been removedAnonymous
January 23, 2007
The comment has been removedAnonymous
January 23, 2007
The comment has been removedAnonymous
January 23, 2007
The comment has been removedAnonymous
January 23, 2007
The comment has been removedAnonymous
January 23, 2007
The comment has been removedAnonymous
January 23, 2007
The comment has been removedAnonymous
January 23, 2007
The comment has been removedAnonymous
January 23, 2007
The comment has been removedAnonymous
January 23, 2007
@Shuggy You can image an helper class to hide the complexity of object / value types combinations . public class HashCodeBuilder{ public void AddObject(object o){ //... } public void AddHash(int hash){ //... } public override int GetHashCode(){ //... } } public class Foo{ object _bar1; object _bar2; uint _bar3; public override int GetHashCode(){ HashCodeBuilder hashCodeBuilder = new HashCodeBuilder(); hashCodeBuilder.AddObject(_bar1); hashCodeBuilder.AddObject(_bar2); hashCodeBuilder.AddHash(_bar3.GetHashCode()); return hashCodeBuilder.GetHashCode(); } }Anonymous
January 23, 2007
Olivier This approach has some problems regarding performance
- you create a new object each time! (even if you cache it you still have the hit in space terms of caching it somwhere)
- lots of function calls - if you have structs then they will not be inlined. This then means you must add the invocation of GetHashcode() which hardly helps you in usability stakes.
- it doesn't help with enums at all - they still behave poorly If you were willing to sacrifice thread safety and the risk of not 'resetting' the accumulator (which is fine for lots of cases) then a series of static functions in C++/CLI could do what both myself and Frank wanted*. You just can't in C# unless you explicitly provide a method for every single enumeration you care about (and thus an explict EqualityComparer for every enum you care about). C# has unsafe blocks and I would like to be able to use them fully if I wish. The restrictions placed on generic parameter types in them is (to me) annoying. Perhaps the design team felt that discouraging unsafe casts on generic types was worth it.
Plus Frank could put in asm if he wanted to target the architecture since this clearly makes a big difference if you are amalgamating a lot of data which is not enregistered and you care about access alignment.
Anonymous
January 24, 2007
The comment has been removedAnonymous
January 24, 2007
The comment has been removedAnonymous
January 24, 2007
I continue to be astounded by what my readers can come up with. As usual I had a purpose for posing myAnonymous
January 25, 2007
The comment has been removedAnonymous
January 25, 2007
The comment has been removedAnonymous
January 25, 2007
The comment has been removedAnonymous
January 25, 2007
The comment has been removedAnonymous
January 25, 2007
>>Of course since this is Rico we are talking about this would be case of measure, measure measure. GRINAnonymous
January 25, 2007
The comment has been removedAnonymous
January 25, 2007
OK, got it. And Dictionary<T,v> does the caching probably exactly for this reason. But look what happens in the System.Collections.Hashtable implementation...Anonymous
January 26, 2007
.mytable { font-face: arial; font-size: 8pt; } Well once again there have been many thoughtful replies