Comression & Deduplication – Oil & Water or Milk & Cookies
UPDATE
Oil & Water?
Last week Mike Davis from Ocarina Networks published a blog post "?" It was a good piece and from what I understand, and I don't know Mike, but he will be taking over blogging as and I wish her the best. The reason for this piece is because Mike made some interesting statements in his piece and I had some questions. I know the guys at Wikibon have and I tried asking my questions via twitter and then on his blog but haven't received any feedback (trust me, I am not nieve, I know we are all very busy) so thought it would be interesting to share my thoughts and try to start some dialog.
Mike stated:
"If you apply a compression-only workflow to a dataset let’s say you get 50%. Now run the same data set through a dedupe-only workflow and you’ll get maybe 20% (remember this is primary storage not backup data). Now take those little chunks and pointers from the dedupe workflow and compress them; you might get an additional 35% for a total of 55%. So compression of deduped data is less effective than on the raw data-set, but the combination (for this example) has eeked out a 5% advantage over the compression-only workflow."
I understand Mike to be saying that if you used deduplicaiton and compression you could potentially get an additional 5% optimization of your storage over standard compression. My question is, At what cost? I don't necessarily mean $ cost either, while this is a factor, but at what cost to the end user and the IT administrator. When I think of capacity optimization for primary storage, here is what I believe the requirements are for IT:
- Optimization cannot cause any impact to the performance of the storage array
- Optimization cannot cause any change in downstream processes for the systems administrator
- Optimization cannot cause any increase in storage management functions




