Notes on ‘Chord: A Scalable Peer-to-peer Lookup Service’
2017-08-05
Link: https://pdos.csail.mit.edu/6.824/papers/stoica-chord.pdf
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications was written by Ion Stoica, Robert Morris (of Morris worm, YC and xv6 fame), et al at PDOS MIT in 2001. That department is also responsible for the distributed systems class at MIT.
Brief notes taken using free recall and are subject to errors.
The paper
Chord is a protocol for looking up nodes by a key in a
decentralized peer to peer system. Each node can then have some
arbitrary value, but that is outside of the scope of the
protocol. It provides a single key operation:
lookup(k)
which provides the location of a node
responsible for that key k. It can thus be a key component of a
distributed hash table.
It does this without using any special nodes, unlike previous peer-to-peer systems such as Napster or DNS. DNS, for example, comes with a set of special root servers pre-programmed. This makes the Chord protocol more decentralized and resiliant to failure.
Chord uses consistent hashing, which means the
values of keys are evenly distributed despite nodes joining and
leaving the system. One key innovation is that the consistent
hashing in a distributed hash table doesn’t require a node to
know about all other nodes. Instead, each node only knows about
a select set of other nodes. This way of implementing routing
means that it only takes on the order of O(log n)
oeprations to find the value you are looking for, despite nodes
joining and leaving.
The network topology can be thought of as a ring of nodes. If we lookup a key at a node and it doesn’t have the value, the node can point us in the right direction. It does this using a finger table which includes the address of other nodes. By using some nifty modular artihmetic this ensures we will always get closer to the actual value.
Questions
How can you convince yourself that “inconsistent” hashing doesn’t distribute values evenly as the size of the hash table changes?
How do we guarantee that values don’t get lost?
As nodes silently leave the network, how do other nodes update their routing tables?
What guarantees can we make when nodes don’t cooperate/are evil?
Further reading
Consistent hashing vs inconsistent illustration
Napster, Gnutella and DNS system overview
Practical uses of Chord etc, e.g. Kademelia / Bittorrent DHT proposal
Sybil attack paper, relevance of
Chord illustration of node information being updated
More distributed systems papers and lecture notes from MIT PDOS
Other DHTs: CAN, Tapestry, Pastry