11 Tweets 2 reads Dec 24, 2022
/1 Is it possible to achieve at least a 10x performance boost compared to the original Kafka and Cassandra? How to achieve that? What are the trade-offs?
/2 There is an exciting class of storage software like 𝐑𝐞𝐝𝐩𝐚𝐧𝐝𝐚 and 𝐒𝐜𝐲𝐥𝐥𝐚𝐃𝐁 that boasts at least an order of magnitude improvement in performance.
/3 Redpanda and ScyllaDB are used as examples in the diagram below. Redpanda can be compared to Kafka, while ScyllaDB is like NoSQL Cassandra.
/4 🔹𝐍𝐨 𝐉𝐕𝐌, 𝐍𝐨 𝐆𝐂
Kafka and Cassandra are written in JVM compatible languages, and usually suffer from high tail latency, where the average latency performs good but 99% latency is not so good due to GC (Garbage Collection).
/5 Redpanda and ScyllaDB are rewritten from scratch using C++ and leverages some new frameworks (For example, SeaStar). They are hard to code, but can achieve much higher performance .
/6 🔹𝐒𝐡𝐚𝐫𝐞-𝐧𝐨𝐭𝐡𝐢𝐧𝐠 𝐚𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭𝐮𝐫𝐞
Every request is pinned to a CPU core. There is no memory contention between cores. This is also friendly to NUMA (Non-Uniform Memory Access) architecture, so that thread can access the memory closer to the CPU core.
/7 🔹𝐙𝐞𝐫𝐨-𝐜𝐨𝐩𝐲 𝐍𝐞𝐭𝐰𝐨𝐫𝐤𝐢𝐧𝐠
Using SeaStar framework, both products can access network devices directly in user mode, and the kernel is not involved. Zero-copy, zero-lock, and zero-context-switch.
/8 💡 Final words
What is the drawback of this new class of software? Performance does not come for free. The level of complexity of this class of software is higher than the ones from the previous generation. C++ is already difficult to program in.
/9 The asynchronous programming model enforced by Seastar makes it even harder to reason about.
/10 Having their own co-operative scheduler means taking full responsibility for managing long-running tasks. It is challenging to ensure that every task takes as short as possible to complete. Any latency impact from errant tasks could be felt throughout the entire stack.
/11 References:
[1] Seastar
[2] Redpanda blog
[3] ScyllaDB university

Loading suggestions...