Rattibha

(1/18) The Problem with Merkle Trees
At the heart of @Bitcoin, @ethereum and many blockchain computers is the Merkle Tree. While this data structure has served us well, it is not perfect... and if you look ahead, you can see impending problems.
Let's talk Merkle proof scaling.

(2/18) A Merkle Tree is a data structure that allows the efficient verification of the contents of a large dataset.
Check out this thread for a deeper dive, but we'll quickly review the basics.
(also note the typo below... Markle tree HAHA)

(3/18) Merkle Trees are made by successive rounds of hashing (applying a hash function to a set of data).
A hash function transforms data of arbitrary contents (and length) into a unique, compact string. Think of a hash like a unique serial number for a set of data.

(4/18) We begin by placing our data into the bottom row and picking a tree width (number of children per node). To move up a level, we simply select [width] number of nodes and hash them together.
This continues until a single hash is reached. We call this hash the Merkle root.

(5/18) Remember, hashing creates an identifier for whatever was put into the hash function. This method will generate a unique string for each node, irrevocably tied to the underlying dataset.
Any changes to the data will require updating the tree, all the way up to the root.

(6/18) Once we have our Merkle root, we are in business. The root is a lightweight string that can be used to verify any single piece of data in the associated Merkle Tree.
Your tree can have billions of items, yet our Merkle root can verify even a single datum spec.

(7/18) We call the trick a Merkle proof; rather than transfer the entire dataset, a user can send a single data point AND the hashes that make up the intermediate nodes of the Merkle Tree.
Using just the branches and the data point, a verifier can rebuild the Merkle root.

(8/18) If we didn't use computer science, (basically) the only way to verify a piece of data is in the dataset is to pass the entire dataset.
The Merkle Tree is a huge improvement, drastically reducing the amount of data transmission needed to verify the point.

(9/18) Now, you might look at the graphic above and notice something: most of the components of the Merkle proofs are inner nodes (squares).
What if we could reduce the number of inner nodes by increasing the width (and therefore decreasing the height) of the tree?

(10/18) Let's take a look at a wider tree, one with width 4.
Turns out, the Merkle proof is actually LARGER. Yes, there are less levels we need to traverse before we reach the root, but each level requires so much more data.
Maybe a width of 4 is especially bad?

(11/18) Unfortunately, this is not the case. The wider a Merkle Tree gets, the less efficient the Merkle proofs are. In fact, a Merkle Tree is optimal when width = 2.
Take a look at the last one, where width = 16. That's what @ethereum uses.

(12/18) Here's the formula for calculating how many components a Merkle proof will need.
Once again, the end result is that Merkle Trees are optimal when width = 2.

(13/18) Merkle trees are incredibly powerful and versatile data structures that will continue to see use long after you and I are gone, but they are not perfect for all applications.
This is especially true as the tree grows; proof size increases with tree/dataset size.

(14/18) And this is ultimately the problem we find ourselves up against. @ethereum uses Merkle trees to store the state of the EVM. Every account, balance, asset, protocol, etc is stored inside of Merkle tree, updated with every block, every 12 seconds.

(15/18) Today, the tree (technically multiple) is MASSIVE, but we are still in @ethereum's infancy. Yes, the Merkle tree still is much more efficient that transferring the entire data set, but the Merkle proofs are getting so large we are running up against the same barriers.

(16/18) As previously mentioned, today @ethereum uses Merkle trees with a width of 16. We can make some (significant) improvements just by transitioning to the optimal Merkle structure.
But that's just a band-aid for a systemic issue, for the tree will still inexhaustibly grow.

(17/18) So, dear reader, here's where I am going to leave you off: with a huge freaking problem. The data structure at the core of @ethereum is literally not scalable, it's only going to work as long as Ethereum remains (relatively) unused.
But don't worry, we'll be back.

(18/18) We haven't spent all this time just figuring out what was wrong with Merkle Trees.
We've also been working on a replacement!
youtube.com

Like what you read? Help me spread the word by retweeting the thread (linked below).
Follow me for more explainers and as much alpha as I can possibly serve.

Categories

More from this author

Related Threads

Popular Threads

Categories

More from this author

Related Threads

Popular Threads

Unroll Thread