Chapter 5. Using Disk and Deduplication for Data Protection

In this section of the book, I’ve covered some important conceptual terms and practices, starting with the very important concept of the difference between backup and archive and what makes something a backup (versus a copy). Then I dug a bit deeper into backup, looking at metrics (especially RTO, RPO, RTA, and RPA) along with backup levels and how things are included (or excluded) from backups. Now I want to look at how the typical data path of backup has changed over the past 20 years, as well as how these changes have affected the choices we have when it comes to recovery.

Disk has evolved from hardly being used in backups at all to becoming the primary target for most backups today. (It is also used with archives, but less so, because the economics are different.) There are two primary reasons for the increased use of disk; the first was when vendors started creating disk arrays, using AT attachment (ATA) and serial AT attachment (serial ATA, SATA) disk drives. Prior to this, you really only saw those disk drives in consumer computers and not in the datacenter. Using SATA disk drives made disk significantly less expensive than it used to be.

However, the technology that really made disk feasible was deduplication. It reduces the cost of disk by at least an order of magnitude, and it makes a number of other technologies possible as well. Let’s take a look at this extremely important technology.

Deduplication

Deduplication ...

Get Modern Data Protection now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.