<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Comression &amp; Deduplication &#8211; Oil &amp; Water or Milk &amp; Cookies</title>
	<atom:link href="http://www.thestoragealchemist.com/comression-deduplication-oil-water-or-milk-cookies/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.thestoragealchemist.com/comression-deduplication-oil-water-or-milk-cookies/</link>
	<description>Turning Storage Technology into IT Gold</description>
	<lastBuildDate>Wed, 18 Jan 2012 06:44:37 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: josephmartins</title>
		<link>http://www.thestoragealchemist.com/comression-deduplication-oil-water-or-milk-cookies/comment-page-1/#comment-233</link>
		<dc:creator>josephmartins</dc:creator>
		<pubDate>Tue, 08 Jun 2010 16:42:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.thestoragealchemist.com/?p=769#comment-233</guid>
		<description>I left the following comment on Mike Davis&#039;s blog on June 2nd:

&quot;I don’t know why, but there is a misconception among practitioners about the technologies we all refer to as compression and deduplication. They aren’t oil and water. They’re more like two different flavors of water.

A few of my past comments about the topic can be found here:
http://wikibon.org/wiki/v/Pitfalls_of_compressing_online_storage

I wrote, “De-dupe (of the type the storage industry now markets) IS a form of compression. Traditional “dedupe” (aka compression) occurs within a file (e.g. gif, jpg, zip, tar, etc) where the data resides along with its dictionary. In contrast, storage-level de-dupe is executed across files and repositories using an external “dictionary” managed separately from the files. That is to say, compression is little more than ultra-granular deduplication occurring within a file.”

Obviously, it is always possible to further compress data that has been “deduplicated” by looking for redundancy at a more granular level than that which was used to deduplicate the original data. It is no different, in practice, than turning up the level of compression or deduplication from low to high such that comparisons are made between decreasingly smaller strings.

So, Mike, figures such as 50% compression, 20% deduplication and a combined 55% (dedupe and compression) are absolutely meaningless without additional context. The outcome of each test would depend on the base (and relative) levels of compression/deduplication. Crank up dedupe (using smaller or variable length strings) and the benefit from further compression will drop. Dial back the dedupe and the benefits from further compression will jump. A useful (if imperfect) analogy: It’s a bit like compressing a JPG in Photoshop using the “low” setting, then compressing the output file a second time using the “high” setting…albeit lossy.&quot;

To-date Mike has not approved the comment to appear on his blog, nor has he replied to it. Frankly, there&#039;s not much he can say.

It&#039;s definitely Milk &amp; Cookies.</description>
		<content:encoded><![CDATA[<p>I left the following comment on Mike Davis&#8217;s blog on June 2nd:</p>
<p>&#8220;I don’t know why, but there is a misconception among practitioners about the technologies we all refer to as compression and deduplication. They aren’t oil and water. They’re more like two different flavors of water.</p>
<p>A few of my past comments about the topic can be found here:<br />
<a href="http://wikibon.org/wiki/v/Pitfalls_of_compressing_online_storage" rel="nofollow">http://wikibon.org/wiki/v/Pitfalls_of_compressing_online_storage</a></p>
<p>I wrote, “De-dupe (of the type the storage industry now markets) IS a form of compression. Traditional “dedupe” (aka compression) occurs within a file (e.g. gif, jpg, zip, tar, etc) where the data resides along with its dictionary. In contrast, storage-level de-dupe is executed across files and repositories using an external “dictionary” managed separately from the files. That is to say, compression is little more than ultra-granular deduplication occurring within a file.”</p>
<p>Obviously, it is always possible to further compress data that has been “deduplicated” by looking for redundancy at a more granular level than that which was used to deduplicate the original data. It is no different, in practice, than turning up the level of compression or deduplication from low to high such that comparisons are made between decreasingly smaller strings.</p>
<p>So, Mike, figures such as 50% compression, 20% deduplication and a combined 55% (dedupe and compression) are absolutely meaningless without additional context. The outcome of each test would depend on the base (and relative) levels of compression/deduplication. Crank up dedupe (using smaller or variable length strings) and the benefit from further compression will drop. Dial back the dedupe and the benefits from further compression will jump. A useful (if imperfect) analogy: It’s a bit like compressing a JPG in Photoshop using the “low” setting, then compressing the output file a second time using the “high” setting…albeit lossy.&#8221;</p>
<p>To-date Mike has not approved the comment to appear on his blog, nor has he replied to it. Frankly, there&#8217;s not much he can say.</p>
<p>It&#8217;s definitely Milk &amp; Cookies.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Amit</title>
		<link>http://www.thestoragealchemist.com/comression-deduplication-oil-water-or-milk-cookies/comment-page-1/#comment-207</link>
		<dc:creator>Amit</dc:creator>
		<pubDate>Thu, 15 Apr 2010 16:44:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.thestoragealchemist.com/?p=769#comment-207</guid>
		<description>An excellent post!

BTW - It&#039;s a shame there is no industry standard benchmarks for primary storage compression/dedup. This would allow potential IT buyers to compare performance and data reduction at the same time.</description>
		<content:encoded><![CDATA[<p>An excellent post!</p>
<p>BTW &#8211; It&#8217;s a shame there is no industry standard benchmarks for primary storage compression/dedup. This would allow potential IT buyers to compare performance and data reduction at the same time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve Kenniston</title>
		<link>http://www.thestoragealchemist.com/comression-deduplication-oil-water-or-milk-cookies/comment-page-1/#comment-179</link>
		<dc:creator>Steve Kenniston</dc:creator>
		<pubDate>Mon, 05 Apr 2010 21:34:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.thestoragealchemist.com/?p=769#comment-179</guid>
		<description>Matt, 
ZFS is a &#039;real-time&#039; compression file system, the only issue is that while they do &lt;em&gt;talk &lt;/em&gt;about no performance issues, they have a ton - it isn&#039;t fast enough to run real time and the only way to make it so is with a ton of horsepower.  So my first question, &#039;at what cost&#039; is still the case.  Now, if ZFS starts to have good performance, and as they claim, there is no change to the application (which I am not positive about but for this purpose, I&#039;ll assume there isn&#039;t), then this is a good start.  

The next issue is does it work with your backups?  Now if your backing up to tape the question is, what ends up on tape and can it be found easily enough should you need to do a recovery?  Given the fact that IT has invested heavily over the last couple of years, and they all tell you to decompress your data before you backup - the question is - does it work with deduplicaiton?  There is only one technology that does this today because it does random access compression and that is a part of the IP of Storwize.  So the new question is, does it make sense to stop doing deduplication for the segment of your data that is 4x or more than your primary with all the fulls and incrementals using a compression technology on primary that may only save you 50% of the primary footprint and may be slow?  Again, there may be cases where that is an okay solution, as with every answer in IT - it depends.</description>
		<content:encoded><![CDATA[<p>Matt,<br />
ZFS is a &#8216;real-time&#8217; compression file system, the only issue is that while they do <em>talk </em>about no performance issues, they have a ton &#8211; it isn&#8217;t fast enough to run real time and the only way to make it so is with a ton of horsepower.  So my first question, &#8216;at what cost&#8217; is still the case.  Now, if ZFS starts to have good performance, and as they claim, there is no change to the application (which I am not positive about but for this purpose, I&#8217;ll assume there isn&#8217;t), then this is a good start.  </p>
<p>The next issue is does it work with your backups?  Now if your backing up to tape the question is, what ends up on tape and can it be found easily enough should you need to do a recovery?  Given the fact that IT has invested heavily over the last couple of years, and they all tell you to decompress your data before you backup &#8211; the question is &#8211; does it work with deduplicaiton?  There is only one technology that does this today because it does random access compression and that is a part of the IP of Storwize.  So the new question is, does it make sense to stop doing deduplication for the segment of your data that is 4x or more than your primary with all the fulls and incrementals using a compression technology on primary that may only save you 50% of the primary footprint and may be slow?  Again, there may be cases where that is an okay solution, as with every answer in IT &#8211; it depends.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matt</title>
		<link>http://www.thestoragealchemist.com/comression-deduplication-oil-water-or-milk-cookies/comment-page-1/#comment-178</link>
		<dc:creator>Matt</dc:creator>
		<pubDate>Mon, 05 Apr 2010 01:19:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.thestoragealchemist.com/?p=769#comment-178</guid>
		<description>Hi. Just to ask, you state a couple of commercially available solutions for compression and/or deduplication on primary storage, but, with significant CPU horsepower and RAM, isn&#039;t ZFS a known realtime compression solution and possibly a reliable deduplication solution for primary storage?</description>
		<content:encoded><![CDATA[<p>Hi. Just to ask, you state a couple of commercially available solutions for compression and/or deduplication on primary storage, but, with significant CPU horsepower and RAM, isn&#8217;t ZFS a known realtime compression solution and possibly a reliable deduplication solution for primary storage?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: enhanced

Served from: www.thestoragealchemist.com @ 2012-02-05 23:07:43 -->
