Log-Structured Merge Trees: memtable writes, SSTable flushes, compaction, bloom filters, and read/write amplification
Compaction rewrites data to lower levels, multiplying total disk writes beyond what the client sent.
Write path: All writes go to an in-memory sorted structure (memtable). When the memtable reaches capacity (4 entries), it is flushed as an immutable SSTable to Level 0.
Compaction: When L0 accumulates 3 SSTables, they are merge-sorted into L1, deduplicating keys. L1 compacts similarly to L2 at 4 tables.
Read path: Check memtable first (newest writes), then L0 (newest first), then L1, then L2. Bloom filters let us skip SSTables that definitely do not contain the target key.
Deletes: Write a tombstone marker instead of removing. Actual key removal happens during compaction when tombstones are garbage-collected.
Bloom filters: Probabilistic data structure. Returns "definitely not here" or "probably here". False positives possible but false negatives impossible. Saves disk I/O by avoiding unnecessary SSTable reads.
Write amplification: Total bytes written to disk divided by bytes received from clients. Compaction rewrites data multiple times as it moves down levels. Typical LSM trees have 10-30x write amplification.