Architecture Notes
Architecture Notes

@arcnotes

13 Tweets 2 reads Apr 20, 2024
It is often surprising how little is known about how databases operate at a surface level, considering they store almost all of the states in our applications. Things You Should Know About Databases.
Indexes are a data structure that helps decrease the look-up time of requested data. Indexes achieve this with the additional costs of storage, memory, and keeping it up to date (slower writes), which allows us to skip the tedious task of checking every table row.
So here is where most developers go – I have seen this problem before; we need some dictionary (hash map) and a way to get to the specific row we are looking for. These are called index leaf nodes.
B as in balance. So you might wonder where you made a massive error to find yourself reading about B-Trees you hated from school. This tree structure gets its name balanced because the depth is uniform across the entire tree.
The main difference B+ Trees show off is that intermediate nodes don't store any data on them. Instead, all the data references are linked to the leaf nodes, which allows for better caching of the tree structure.
Secondly, the leaf nodes are linked, so if you need to do an index scan, you can do a single linear pass rather than traversing the entire tree up and down and loading more index data from the disk.
Non-repeatable reads occur if you cannot get a consistent view of the data between two subsequent reads during your transaction. In specific modes, concurrent database modification is possible, and there can be scenarios where the value you just read can be modified.
Similarly, a dirty read occurs when you perform a read, and another transaction updates the same row but doesn't commit the work, you perform another read, and you can access the uncommitted (dirty) value, which isn't a durable state change and is inconsistent with reality.
Phantom reads are another committed read phenomena, which occurs when you are most commonly dealing with aggregates, which results in you getting two different values if your database doesn't support range locks for these transactions.
The SQL standard defines 4 standard isolation levels these can and should be configured globally (insidious things can happen if we can't reliably reason about isolation levels).
I hope you've found this thread helpful.
Follow me @arcnotes for more.
Like/Retweet the first tweet below if you can:

Loading suggestions...