Chapter 13. Using Data De-Duplication to Lighten the Load
Sometimes people don't exercise or eat right and end up getting fat. A similar situation can happen with data storage. Lazy users who never delete anything make the poor backup administrators' jobs even harder, forcing them to back up useless or duplicate data. Many companies have no formal polices on storing information, so data that the business doesn't really need gets stored, backed up, and even replicated for disaster recovery anyway.
Consider these situations:
Users who never delete e-mails or send e-mails with large attachments to large distribution lists
Users who store multiple copies of the same file because they're not sure which one holds the right changes
Users who download and store MP3 files at work
Multiple duplicate copies of executables like
winword.exe(the executable file that makes Microsoft Word work) backed up from all the lap-tops and desktops in the company
All these situations conspire together to waste storage space. As a result, many SAN networks store much more data than necessary, which raises costs. This chapter deals with the general concept of data de-duplication: what it is, how it works, where it should be applied, and the results you should expect.
Understanding Data De-Duplication
In simplified terms, data de-duplication means comparing objects (usually, files or blocks) and removing all non-unique or duplicate objects (copies). If you look at the left side of Figure 13-1, you see several blocks ...