148 Tivoli Storage Manager V6.1 Technical Guide
deduplication ratio might be as high as 202:1. This ignores a lot of factors but gives an idea of
a best-case scenario. Alternatively, a compressed file (such as MPEG video) will not usually
further compress or deduplicate at all, but the savings could still be 100:1 in the example
above because the same data is repeated 101 times. Similar savings are also quite common
on certain types of databases.
Taking a very simple, traditional approach to backup (taking full backups daily) results in a
high proportion of redundancy in the data stored, which traditionally meant lots of tapes. On
tape we might be looking at 20 or more copies of the same files, each containing the same
redundancies. In the example above, we could multiply the 200:1 ratio by nearly 20 in such a
case. Deduplicating that sort of system is a very good idea indeed, from a space-savings
point of view: the ratios achieved would be high, the space savings large.
With Tivoli Storage Manager, IBM has always endeavoured to use storage more intelligently.
The progressive incremental backup method reduces the duplication inherent in backups, so
when we look at equivalent ratios for Tivoli Storage Manager backup data, they do not flatter
the deduplication equipment so much. What we are really seeing here is how much more
efficient Tivoli Storage Manager is at avoiding duplicates in the first place, than the traditional
approach. We have tried to avoid un-necessary duplication, and in some ways Tivoli Storage
Manager is still more efficient: for example, doing full backups every day still requires the
processor, disk and network resources to move all the data to the deduplication system. With
Tivoli Storage Manager and progressive incremental backups, we avoid reading a lot of that
data, so we avoid using the resources. Add subfile backups to the solution, and we only move
the parts of the files that have changed, further reducing the redundancy, before the data ever
gets to the Tivoli Storage Manager Server.
7.1.3 Tivoli Storage Manager V6.1 deduplication overview
In this section we introduce the Tivoli Storage Manager V6.1 specifics of deduplication in the
context of the features already present, which reduce deduplication at the source.
Tivoli Storage Manager has contained a duplication avoidance strategy since its inception as
WDSF in 1990—the progressive incremental backup methodology. This reduces the amount
of duplicates for backup data coming into the server, although in a fairly simple fashion. It only
backs up files that have changed—for example, one can simply change the modification data
of a file and Tivoli Storage Manager will need to back it up again. In terms of effect on stored
data, this is similar to data deduplication at the file level—we are reducing the redundant data
at source by not backing up the same file content twice.
Since Tivoli Storage Manager 4.1, there has been a feature called adaptive subfile backup.
This allows for the blocks of data changed within a file to be sent over the network to the Tivoli
Storage Manager Server, as opposed to all the blocks: essentially like a block-level
incremental backup. As such, it forms another type of duplication avoidance. It has some
limitations—it currently only works with files up to 2 GB, and the reconstruction of the data
during restore causes additional workload on the Tivoli Storage Manager Server over regular
incremental workloads. It is most useful for backups where the client has very limited network
access to the Tivoli Storage Manager Server, such as a branch office or a mobile device.
Tivoli Storage Manager V6.1 is capable of deduplicating data at the server. It performs
deduplication out of band, in Tivoli Storage Manager server storage pools. Deduplication is
only performed on data in FILE (sequential disk) devtype storage pools—it does not
deduplicate DISK (random disk) storage pools, or tape storage pools. In addition, data
deduplication has to be enabled by the Tivoli Storage Manager administrator on each pool
individually, so it is possible to deduplicate those types of data which will benefit most, as
opposed to everything. There is no requirement for it to be enabled for all pools.