Hash Functions in .NET – Right Tool for the Right Job
Hi, Chetan Bhat here. I’m a developer with the Security Tools Team.
In this post I will talk about common mistakes developers make when when using hash functions. Any hash function is required to meet the following two requirements.
- It must be easy to calculate for any possible message.
- It must return a fixed-size bit string irrespective of the length of the message.
Broadly speaking there are two types of hash functions: non-cryptographic hash functions and cryptographic hash functions. A cryptographic hash function is expected to also meet the two additional requirements below:
- It must be practically impossible to find the message given the hash (pre-image).
- It must be practically impossible to find two different messages with the same hash (hash collisions).
Developers often confuse between these two kinds of hash functions which leads to following problems.
- Using non-cryptographic hashes in place of cryptographic hashes has serious security consequences.
- Using cryptographic hashes in place of non-cryptographic hashes has severe performance problems (either memory or speed).
The .Net BCL provides both cryptographic and non-cryptographic hash functions. Cryptographic hash functions such as MD5, SHA-1, SHA512 etc. that are available in the System.Security.Cryptography namespace are primary designed for security purposes (like digital signatures). On the other hand, non-cryptographic functions like Object.GetHashCode() is not meant for security considerations but can be used for performance reasons (like fast string comparisons and hash-tables).
The rest of the blog post will demonstrate the fact that Object.GetHashCode is not a cryptographic hash and how it (and other low bit count hashes) can be defeated.
Birthday attacks are used to defeat the “hash-collision” requirement for crypto-hashes with small bit lengths. It is based on the theory of the Birthday Paradox, commonly seen in Probability Theory textbooks. The “paradox’ comes from the fact that in a group of 41 randomly selected people, there is more than a 90% chance that at least two people share the same birthday. Also, In a group of 57 people, there is more than a 99% chance that at least two people share the same birthday.
The fact can be easily extended to hash functions too: For a hash algorithm with bit-size N with uniform distribution, a set of 2^(N/2) random hashes (hashes of random data) has a very good probability that at least two hashes in the set are the same.
In our case, the GetHashCode returns an Int32 which makes N = 32. So, effectively it takes just 2^(32/2) = 65536 hashes of random data to find a hash collision (which is BAD and NOT SECURE). However it still takes 2^32 = 4294967296 iterations or random trials for a successful pre-image attack.
Here’s a C# program which find collisions on the String.GetHashCode() function. Here’s how it works:
- The program randomly generates 100,000 strings of fixed length (in the order of 65,536 but improves our chances)
- Hash of each of these strings is calculated by calling String.GetHashCode() and is stored along with the string in a list.
- The list is then searched for duplicate hashes (collisions) and strings with colliding hashes are reported via the command line (String –> Hash format).
- If duplicates are not found, we try again (we are bound to find one or more collisions very soon).
using System;
using System.Collections.Generic;
using System.Text;
namespace BirthdayAttack
{
class Birthday
{
static Random r = new Random();
public static void Main()
{
while (true)
{
IntStrPair[] collisions = Attack(16);
if (collisions.Length >= 2)
{
foreach (IntStrPair s in collisions)
{
Console.WriteLine(s);
}
Console.ReadKey(true);
return;
}
Console.WriteLine("Trying Again..");
}
}
public static string RandomString(int size)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < size; i++)
{
sb.Append(Convert.ToChar(Convert.ToInt32(Math.Floor(65 + 26 * r.NextDouble()))));
}
return sb.ToString();
}
// Try to find collisions by generating random strings of Size = [size]
public static IntStrPair[] Attack(int size)
{
List<IntStrPair> lst = new List<IntStrPair>();
Console.WriteLine("Generating List..");
for (int i = 0; i < 100000; i++)
{
string s = RandomString(size);
lst.Add(new IntStrPair(s));
}
lst.Sort();
Console.WriteLine("Done!");
List<IntStrPair> collision = new List<IntStrPair>();
bool flag = false;
for (int i = 1; i < lst.Count; i++)
{
if (lst[i].Hash == lst[i - 1].Hash && lst[i].Str!=lst[i-1].Str)
{
if (!flag)
collision.Add(lst[i - 1]);
collision.Add(lst[i]);
flag = true;
}
else
flag = false;
}
Console.WriteLine("Found {0} Collisions!", collision.Count);
return collision.ToArray();
}
// Class used to pair a String and its corresponding Hash
public class IntStrPair : IComparable
{
public int Hash;
public string Str;
public IntStrPair(string s)
{
Str = s;
Hash = s.GetHashCode();
}
public override string ToString()
{
return Str + " -> " + Str.GetHashCode().ToString();
}
public override int GetHashCode()
{
return Hash;
}
public int CompareTo(object obj)
{
if (obj is IntStrPair)
{
IntStrPair is1 = obj as IntStrPair;
if (is1.Hash > this.Hash)
return 1;
if (is1.Hash < this.Hash)
return -1;
return this.Str.CompareTo(is1.Str);
}
else
{
throw new ArgumentException();
}
}
}
}
}
Here’s a snapshot of a sample run (results would vary between runs as the program uses random numbers).
As a conclusion, I’d like to emphasize the fact that when dealing with real-world problems, developers need to carefully assess the requirements of security and performance and appropriately use crypto and non-crypto hashing functions. After all, security (or performance) of an application is only as strong (or as fast) as its weakest (or slowest) link.