Building a Distributed Hash Table with the DRT: Part 1

This is the first article in a multi-part series describing a design for a distributed hash table (DHT) built with the Distributed Routing Table (DRT), our new Windows 7 peer-to-peer platform piece. If you’re interested in building a DHT of your own using the Distributed Routing Table, or you have questions about these articles, please email me (tylbart@microsoft.com). I can answer questions, point you towards the right documentation and help you get the tools you need to get started.

Distributed Hash Tables have received lots of well deserved attention from academia and industry. They have been used to build some powerful (and sometimes disruptive) software, and have many properties and architectures worthy of deep analysis.

Wikipedia has brief, but informative introduction to DHTs which is worth reading even if you’re an experienced peer-to-peer developer (https://en.wikipedia.org/wiki/Distributed_hash_table).

Here’s a DHT definition lifted from the Wikipedia article:

Distributed hash tables (DHTs) are a class of decentralized distributed systems that provide a lookup service similar to a hash table: (name, value) pairs are stored in the DHT, and any participating node can efficiently retrieve the value associated with a given name. Responsibility for maintaining the mapping from names to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows DHTs to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

DHTs form an infrastructure that can be used to build more complex services, such as distributed file systems, peer-to-peer file sharing and content distribution systems, cooperative web caching, multicast, anycast, domain name services, and instant messaging. Notable distributed networks that use DHTs include BitTorrent's distributed tracker, the eDonkey network, YaCy, and the Coral Content Distribution Network.

The Wikipedia article goes on to describe the software components comprised by a typical DHT. There are three basic pieces:

1.  A mechanism for finding the machine responsible for a key, value pair when storing or retrieving a piece of content.

2. A protocol for transmitting or downloading a key, value pair.

3. A protocol and a set of algorithms that preserve the key, value pairs while allowing for the arrival and departure of new machines in the distributed system.

The DRT solves problem number (1) and provides tools that make it easier to solve (2) and (3).

The structured overlay (component 1) is perhaps the most keenly studied component in today’s distributed hash tables because it’s a hard piece of software to get right. A lot of Microsoft blood, sweat and tears went into the DRT, but we’ve come emerged with a high quality protocol that can greatly simplify the development of a DHT. The DRT is based on the PNRP protocol, and has benefitted from our years of study of the PNRP cloud on the real Internet. The DRT has been extensively tested for security, reliability, performance and scale. We have world class simulation infrastructure that allows us to test DRT clouds with millions of virtual nodes under adverse network conditions. Finally, we have rebuilt the PNRP component in Windows 7 using the new DRT platform, and we have a new experience using PNRP in remote assistance.

With all this said, the DRT won’t be complete until we get some feedback from you. Please let me know if you would like to test the DRT or build an application of your own (DHT or otherwise) before we ship Windows 7. You can have a big impact on our product!

The rest of the articles in this series will describe how to program the DRT to be an effective overlay for a DHT. We’ll also talk about some protocols and algorithms for content download/upload and replication and look at some alternative DHTs and overlays in production today.

Thanks and have fun!

-Tyler