Rattibha

Fernando 🇮🇹🇨🇭

10 Tweets 1 reads Dec 04, 2022

Consistent hashing is a common technique used in distributed systems.
When it's functional and how it works: {1/9}

Distributed systems use hashing in many scenarios.
Two common ones are:
- mapping web client requests to cache servers
- mapping data to sharding servers where the data is partitioned
{2/9}

In general, the problem is to map keys (data or requests) to a set of servers so that:
- adding and removing servers is not computationally expensive
- keys are balanced in number between servers
Consistent hashing meets the above requirements.
{3/9}

The key idea is to consider the space of possible hash values circular instead of linear.
Both servers and keys are hashed and placed in a circle.
Each key is then assigned to the first server in a clockwise direction.
This already satisfies the 1st requirement.
{4/9}

Only a fraction of the keys is re-mapped when a server S is added or removed.
If S is added, only the keys between the previous server in the circle and S are mapped to S.
Only the keys assigned to S are mapped to the next server in the circle if S is removed.
{5/9}

But the approach doesn't yet satisfy the 2nd requirement because:
- the keys usually are not uniformly distributed over the circle
- the keys distribution becomes more skewed when servers are added or removed
{6/9}

The solution is to introduce virtual nodes for each server.
Virtual nodes are put as placeholders over the circle, dividing it into small ranges.
The more virtual nodes are, the smaller the ranges are and the more uniform the keys' distribution.
{7/9}

Other benefits of introducing virtual nodes are:
- the re-mapping of the keys is faster because multiple physical servers are involved instead of one
- it's possible to assign more virtual nodes to powerful servers and less to less performing servers
{8/9}

Let's have N servers and V virtual nodes for each server.
Using binary search, assigning a key to the closest virtual node takes O(log(NV)) time.
The memory requirement is O(N+V) for storing the servers' hash values and the mapping between servers and virtual nodes.
{9/9}

Thanks for reading!
If you liked it, I'd be grateful if you'd:
• like or retweet the first tweet
• follow @Franc0Fernand0 for more distributed systems content
• subscribe my newsletter polymathicengineer .com (link in bio)
Your support encourages me to keep writing!

Loading suggestions...

Categories

More from this author

Related Threads

Popular Threads

Categories

More from this author

Related Threads

Popular Threads

Unroll Thread