Chapter 26. Data Integrity: What You Read Is What You Wrote

What is “data integrity”? When users come first, data integrity is whatever users think it is.

We might say data integrity is a measure of the accessibility and accuracy of the datastores needed to provide users with an adequate level of service. But this definition is insufficient.

For instance, if a user interface bug in Gmail displays an empty mailbox for too long, users might believe data has been lost. Thus, even if no data was actually lost, the world would question Google’s ability to act as a responsible steward of data, and the viability of cloud computing would be threatened. Were Gmail to display an error or maintenance message for too long while “only a bit of metadata” is repaired, the trust of Google’s users would similarly erode.

How long is “too long” for data to be unavailable? As demonstrated by an actual Gmail incident in 2011 [Hic11], four days is a long time—perhaps “too long.” Subsequently, we believe 24 hours is a good starting point for establishing the threshold of “too long” for Google Apps.

Similar reasoning applies to applications like Google Photos, Drive, Cloud Storage, and Cloud Datastore, because users don’t necessarily draw a distinction between these discrete products (reasoning, “this product is still Google” or “Google, Amazon, whatever; this product is still part of the cloud”). Data loss, data corruption, and extended ...

Get Site Reliability Engineering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.