How Much Backup Capacity Does Deduplication Really Save?
There is a lot of discussion around data deduplication for backup these days. (I wish I could deduplicate all the turkey I ate last week.) In fact, Gartner claims that “…by 2012, deduplication will be applied to 75% of backups.” And when asked “Why?” the response was “…deduplication is too compelling to ignore.” But I say “prove it”. So I put together some backup capacity numbers for storing data on tape (non-compressed and compressed) versus storing data, deduplicated (fixed block and variable block), on disk and the numbers show a dramatic savings in backup space which translates into cost savings.
The Parameters
As with any ‘analysis’ numbers can be ‘spun’ to make them say what you want. That said, I tried to be as straight forward as possible, so let me also show my methodology so you can see how my numbers were derived.
- I charted the amount of capacity created using a retention policy of:
- 14 Dailies
- 4 Weeklies
- 12 Monthlies
- I selected 10TB of primary storage capacity
- I did this for file system backups only
- I charted the data for 30%, 40%, 50% and 60% primary storage growth rates
- I charted traditional tape based backup (non-compressed)
- I charted traditional tape based backup (compressed, 2:1)
- I charted fixed block disk based deduplicated backup
- I charted variable block disk based deduplicated backup (3 to 5 times more efficient than fixed block deduplication)
The Effect
The first thing to think about is the sheer number of full backup copies that must be maintained when utilizing the above retention schedule. The above retention policy leads to 17.2 copies of the primary storage (12 yearly’s + 4 monthlies + the equivalent of 1.2 with dailies = 17.2 copies) . Translation: one terabyte of primary storage becomes 17.2 terabytes of tape storage. This means, backup administrators need to pay for the physical tapes as well as the offsite transport and storage costs. Now 17.2 terabytes of tape doesn’t sound like much but keep in mind that is for 1TB of primary capacity. Ten TB of primary capacity yields 172 TB of tape capacity. Now add in year over year storage growth. At 30% primary storage growth, the backup storage growth grows 23%, at 40% primary storage growth, the backup storage growth grows 29%, at 50% primary storage growth, the backup storage growth grows 33% and at 60% primary storage growth and the backup storage grows 38%.