Understanding compaction

Cassandra deals with this build-up of SSTables over time by means of a process called compaction. Compaction aggregates partitions from multiple files into a single file, and in the process it removes old data and purges tombstones. But housekeeping is only one reason to do this; the other objective is to improve read performance by moving data for a given key into a single SSTable, thereby reducing the disk I/O required to read each key.

The exact mechanism that governs the compaction process depends on which compaction strategy you choose. As of version 3.8 (or 3.0.8, which added time-window compaction and deprecated date-tiered compaction), there are four strategies that ship with Cassandra (although you can implement ...

Get Cassandra 3.x High Availability - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.