Data corruption is rare but it does happen, a phenomenon described scientifically as bit-rot. Sometimes we write to a drive, and a surface or cell failure results in reads failing or returning something other than what we wrote. HBA misconfiguration, SAS expander flakes, firmware design flaws, drive electronics errors, and medium failures can also corrupt data. Surface errors affect between 1 in 1016 to as many as 1 in 1014 bits stored on HDDs. Drives can also become unseated due to human error or even a truck rumbling by. This author has also seen literal cosmic rays flip bits.

Ceph lives for strong data integrity and has a mechanism to alert us of this situation: scrubs. Scrubs are somewhat analogous to fsck on a filesystem and the ...

Get Ceph: Designing and Implementing Scalable Storage Systems now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.